Executive Briefing Data catalogs Concepts capabilities and key
Executive Briefing: Data catalogs – Concepts, capabilities, and key platforms Andrew J. Brust Founder & CEO Blue Badge Insights, Inc.
Meet Andrew Founder and CEO Big Data blogger for ZDNet Data/analytics analyst for Gigaom Microsoft Regional Director, MVP Co-chair Visual Studio Live! Twitter: @andrewbrust
Shameless Plugs bit. ly/abrustzdnet
Agenda The nucleus, value-adds Motivation, sources, taxonomy, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
The Nucleus A place to document all your data sets (tables, files, etc. ) Which really means the metadata and source descriptor/connection info Some annotation/documentation And lots more, depending on vendor…
Value-Adds Tagging Business glossary Data set certification Policy and business rules Sensitive data repository Security/ rolebased access management Query frontend Social & collaboration tools Search Data marketplace Data classification
Agenda The nucleus, value-adds Motivation, taxonomy, sources, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
Motivation, Importance Achieve compliance Improve data lake ROI Unify the data landscape Bring data and the business closer together
Major Data Sources Relational databases, data warehouses No. SQL Data lakes (incl. cloud storage) Software and applications
Taxonomy Goal Data repository focus Build out approaches • Protect: governance and security (defense) • Benefit: data discoverability (offense) • Relational databases and data warehouses • Modern data sources (data lakes, apps, etc. ) • Manual: Stewards • Manual: Crowd-sourced • Automated: ML
The Many Faces of Machine Learning Data Catalogs Identify relationships and flows “Fingerprint” data Serve data scientists Predict your preferences/data set selections
Agenda The nucleus, value-adds Motivation, sources, taxonomy, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
The Data Catalog “Orbit” Data Virtualization Artificial Intelligence Data Catalog Business Intelligence Data Integration
Embedded Catalog Examples Virtualization: Dremio BI: Tableau Integration: Talend AI: Dataiku
Agenda The nucleus, value-adds Motivation, sources, taxonomy, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
Cloud Data Catalogs Purpose #1: serve as metadata store for data prep/transformation, data lake, ML and other services Purpose #2: be a full-featured standalone data catalog AWS Glue and Google Data Catalog fall under 1 Microsoft Azure falls under 2, but underperforms • Rule of thumb: keep an eye out for announcements during Ignite conference in early November • Consider Microsoft’s history with Share. Point’s Business Data Catalog and its current Common Data Service initiative
Open Source Apache Atlas (Governance) Apache Ranger (Security) Apache Hive HCatalog (Metadata)
Agenda The nucleus, value-adds Motivation, sources, taxonomy, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
Acquisitions Cloudera/Hortonworks • Cloudera Navigator being phased out in favor of Cloudera Data Catalog, based on the former Hortonworks Data Steward Studio (and Atlas) Qlik/Podium Data • Podium Data becomes Qlik Data Catalyst • Qlik also bought Attunity, giving it BI, catalog and data integration under one roof
Map of the Market Pure Plays Embedded Cloud Data Management Enterprise
Agenda The nucleus, value-adds Motivation, sources, taxonomy, MLDC Overlaps and embedded Cloud, OSS Acquisitions, map of the market Assessment and forecast
Assessment and Forecast Hard to have so many pure plays More consolidation Cloud providers have to step up their games Cloud competition will get tougher More potential for ML-driven features Compliance/GDPR must improve Category too dry Automation will improve Nearer term: catalogs will close data lake/data warehouse divide Catalogs will become more self-service and even fun
Rate today ’s session Session page on conference website O’Reilly Events App
Thank You! andrew. brust@bluebadgeinsights. com @andrewbrust http: //bit. ly/abrustzdnet
- Slides: 24