MIS 5101 Extract Transform Load ETL Discuss 5

  • Slides: 10
Download presentation
MIS 5101: Extract, Transform, Load (ETL)

MIS 5101: Extract, Transform, Load (ETL)

Discuss (5 minutes) Remember from the readings… • Why are we drowning in data?

Discuss (5 minutes) Remember from the readings… • Why are we drowning in data? • Why the process of ETL necessary? • What is the “single version of the truth? ”

Why are we “drowning in data? ” According to the article? Technological changes?

Why are we “drowning in data? ” According to the article? Technological changes?

Evaluating the tradeoff value(Daccess) vs value(Daccuracy) How much does it cost? How much do

Evaluating the tradeoff value(Daccess) vs value(Daccuracy) How much does it cost? How much do you save? How much do your outcomes improve? How much is an incremental improvement worth? …and the relationships are probably non-linear

Extract, Transform, Load - ETL Copying data from the transactional database to a format

Extract, Transform, Load - ETL Copying data from the transactional database to a format where it can be analyzed Selecting and resolving inconsistencies in data to fill the analytical data store

ETL Defined in a “relational” world Extract Transform Load from various databases across the

ETL Defined in a “relational” world Extract Transform Load from various databases across the organization it into a consistent, analysis-ready format it into an “analytical” data store, where largescale analysis is performed

ETL Defined in a “relational” world Extract Transform Load Query Real-time Database 1 Query

ETL Defined in a “relational” world Extract Transform Load Query Real-time Database 1 Query Data conversion Data Warehouse (Analytical Data Store) Real-time Database 2 Query Data conversion Query On-Demand Reporting

Main ETL Issues: Conversion Stage Data Consistency • What if the data is in

Main ETL Issues: Conversion Stage Data Consistency • What if the data is in different formats? Data Quality • How do we know it’s correct? • What if there is missing data? • What if the data we need isn’t there?

Give examples of data inconsistences in retail in healthcare in finance How do you

Give examples of data inconsistences in retail in healthcare in finance How do you resolve them?

Conflicts abound… Why might there be resistance to this type of aggregation? Is it

Conflicts abound… Why might there be resistance to this type of aggregation? Is it an option to just “fix” the transactional (source) databases? If two data elements conflict, who’s standard “wins? ”