Top five challenges facing the Enterprise Data Warehouse

  • Slides: 6
Download presentation
Top five challenges facing the Enterprise Data Warehouse (EDW) 1 2 3 4 5

Top five challenges facing the Enterprise Data Warehouse (EDW) 1 2 3 4 5 Offload ETL Offload data Add new data sources Data quality Data governance 50 x-100 x ~ 50% Handling new data sources Low confidence No governance Identify and move cold/rarely used data to Hadoop; store more data longer Store years versus months of transaction data store new data sources, such as unstructured and semi-structured, web, social and Io. T data Enable faster, more effective, and more reliable decision making by ensuring that data is trustworthy Data stewardship, business glossary, data lineage; an ungoverned data lake is an unmanageable data lake Reduce processing and storage costs by up to 100 X with Hadoop; free EDW processing capacity for high value analytics and reporting 1

EDW Offloading is the largest reason that companies are adopting Hadoop data lakes, driven

EDW Offloading is the largest reason that companies are adopting Hadoop data lakes, driven by these factors: • The compelling opportunity for realizing significant cost reduction • The significant benefits of establishing a governed data lake to provide self-service analytics for nontechnical users who analyze traditional structured data that is enriched with new forms of data • To set the foundational layer for operationalizing future actionable insights in a customer-centric or product-centric way

EDW Offloading can involveup upto tothree related activities: EDW Offloading can involve activities: 1.

EDW Offloading can involveup upto tothree related activities: EDW Offloading can involve activities: 1. Moving or supplementing data integration from the EDW to Hadoop 2. Moving unused datatypes from the EDW in to Hadoop for enriching EDW analytics 3. Storing new of data 1. Moving or supplementing data integration from the EDW to Hadoop 3. Storing new types of data in Hadoop for enriching EDW analytics Extract EDW data X-way parallel Move data Y-way parallel with data repartitioning Load Hadoop Z-way parallel Same easy drag and drop paradigm EDW 15 tb/hr 30 tb/hr IBM HDFS loading test Just double the hardware 3

In each phase of the EDW offloading activities, organizations have opportunities to add data

In each phase of the EDW offloading activities, organizations have opportunities to add data quality processing and data governance as part of the offloading process Pushing ETL workloads into the EDW has prevented organizations from implementing data quality processing and data governance Data Classification and Validation business EDW policies and rules 4

Only IBM offers a modular solution for all eight of the EDW offloading requirements

Only IBM offers a modular solution for all eight of the EDW offloading requirements 1. Move data 3. Replicate data 2. Transform & integrate data 4. Improve data quality 5. Govern data 6. Augment & enrich data 7. Reference architecture 8. Implementation patterns Trusted Analytics Foundation Data. Stage® Quality. Stage® Big Integrate Big. Quality Info. Sphere® Information Server Information Governance Catalog Data Replication 5

Capabilities Required capability Why this is important IBM solution 1. Move data Low cost,

Capabilities Required capability Why this is important IBM solution 1. Move data Low cost, efficient movement of data Data. Stage/Big. Integrate 2. Transform and integrate Reduce costs while leveraging existing assets Data. Stage/Big. Integrate 3. Improve data quality No data quality means garbage in, garbage out Quality. Stage/Big. Quality/ Information Analyzer 4. Govern your data Ungoverned Hadoop means unmanageable Hadoop Information Governance Catalog (IGC) 5. Replicate Deliver data where and when needed IBM Data Replication 6. Augment and enrich Increase ROI from EDW analytics Data. Stage/Big. Integrate 7. Reference architecture Reduce project costs and risks IBM Enterprise Analytics Reference Architecture 8. Implementation patterns Reduce project costs and risks IBM Analytics Implementation Patterns 6