Data Catalogs

Demand for data catalogs growing very strong since 2018. Its lived upto the hype and expectations for last couple of years and with cloud adoption post covid19 is expected to grow , data catalogs are going to have field day for sure. Let’s begin with understanding what data catalogs are, should you even go for it and if yes, which is the most suitable ones for your business unit in this data driven transformation journey.

Data and metadata are never stagnant in any business, so any medium to large business should have already thinking of cataloging to speed up the movement of data and maximize its business potential in least possible time.

If you are looking for true enterprise grade catalog solutions for your data and provide the metadata platform for your business, here are a list of standards that will help you determine the right data catalog tool for your business needs:

  1. Discovery Performance – Faster data searching and access
  2. Flexible Classification – Freeform tagging with keywords
  3. Seamless Collaboration – Efficiently form metadata relationships
  4. Quick sync with all types of data sources
  5. Easy integration with other data platforms

We reviewed some of the vendors from Gartner’s magic Quadrant for Data Catalog and here is our observation across the above parameters:

Alation:

Alation has an extremely efficient search catalog engine that makes use of active metadata. It has partnered Cloudera, Hortons, IBM, Tableau, Teradata, Trifacta etc enabling easy integration with an array of cloud based and enterprise data platforms.

Collibra:

Collibra has been designed to work deeply with futuristic technologies like Internet of Things and Artificial Intelligence. One of its best features is its configurable platform making it very flexible across multiple verticals. Also, probably the best in freeform tagging amongst all catalogs.

Informatica:

Informatica provides the most comprehensive metadata relationships and tagging datasets in a unified interface that is also application agnostic. The catalog is AI powered (known as catalog of catalogs) and tracks data lineage across different sources.

Azure Data Catalog:

Azure DC gives you the ability to annotate your data and create references of your metadata. The metadata can also be indexed making the data discovery very much a smooth experience.

Google Data Catalog:

Google DC APIs are the most appealing part of the tool as you can automate the creation and patching of tags. It’s also highly scalable as it can sync volumes of data in less time. Comes with global level credential management making your data inventory transparent to all stakeholders possessing the right access.

Criteria Alation Azure Collibra Informatica Google
Discovery Performance X X X
Flexible Classification X X X
Seamless Collaboration X X X
Quick Synch X X
Easy Integration X X X X

As you see each tool comes with its key differentiators, but the catch phrase is “Better the integration, lesser the data friction”. So, the key approach here should be which catalog tool leaves you with lesser integration debt along with sufficing your data discovery models.

However, licensing is also an important factor in your DAR (Decision Analysis Resolution). And make no mistake there are other vendors which are highly competitive in this space like IBM, Alteryx, DATUM, SmartLogic etc. So, review your business model against its licenses and make a balanced decision rather than going for the best product and making your organization toolbox costlier than the business benefits you draw out of it.

Kanerika enables you to create data-driven insights to improve your business.
Kanerika enables you to create data-driven insights to improve your business.