5 Building the Data Warehouse Extracting Data Copyright
5 Building the Data Warehouse: Extracting Data Copyright © Oracle Corporation, 2002. All rights reserved.
Objectives After completing this lesson, you should be able to do the following: • Outline the ETL (Extraction, Transformation, and Loading) processes for building a data warehouse • Identify ETL tasks, importance, and cost • Explain how to examine data sources • Identify extraction techniques and methods • Identify analysis issues and design options for extraction processes • List the selection criteria for the ETL tools • Identify Oracle’s solution for ETL process 5 -2 Copyright © Oracle Corporation, 2002. All rights reserved.
Extraction, Transformation, Loading (ETL) Processes • • • Extract source data Transform/clean data Index and summarize Load data into warehouse Detect changes Refresh data Operational systems Programs Gateways Tools Data Warehouse 5 -3 Copyright © Oracle Corporation, 2002. All rights reserved. ETL
ETL: Tasks, Importance, and Cost Operational systems Extract Clean up Consolidate Restructure Load Maintain Refresh Data Warehouse ETL Relevant Useful Quality Accurate Accessible 5 -5 Copyright © Oracle Corporation, 2002. All rights reserved.
Extracting Data • Source systems – Data from various data sources in various formats • Extraction Routines – Developed to select data fields from sources – Consist of business rules, audit trails, error correction facilities Data mapping Transform Operational databases 5 -7 Data staging area Copyright © Oracle Corporation, 2002. All rights reserved. Warehouse database
Examining Data Sources • • 5 -8 Production Archive Internal External Copyright © Oracle Corporation, 2002. All rights reserved.
Production Data • • • Operating system platforms File systems Database systems and vertical applications IMS DB 2 Oracle Sybase Informix VSAM 5 -9 SAP Shared Medical Systems Dun and Bradstreet Financials Hogan Financials Oracle Financials Copyright © Oracle Corporation, 2002. All rights reserved.
Archive Data • • Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations Operation databases 5 -10 Warehouse database Copyright © Oracle Corporation, 2002. All rights reserved.
Internal Data • • Planning, sales, and marketing organization data Maintained in the form of: – Spreadsheets (structured) – Documents (unstructured) • Treated like any other source data Planning Marketing Accounting Warehouse database 5 -11 Copyright © Oracle Corporation, 2002. All rights reserved.
External Data • • • Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata Purchased databases A. C. Nielsen, IRI, IMS, Walsh America Dun and Bradstreet Barron's 5 -12 Competitive information Economic forecasts Warehousing databases Wall Street Journal Copyright © Oracle Corporation, 2002. All rights reserved.
Mapping Data Mapping data defines: • Which operational attributes to use • How to transform the attributes for the warehouse • Where the attributes exist in the warehouse File A F 1 F 2 F 3 123 Bloggs 10/12/56 Staging File One Number USA 123 Name Mr. Bloggs DOB 10 -Dec-56 Metadata File A F 1 5 -14 Staging File One Number F 2 F 3 Copyright © Oracle Corporation, 2002. All rights reserved. Name DOB
Extraction Techniques • • • Programs: C, C++, COBOL, PL/SQL, Java Gateways: transparent database access Tools: – In-house developed tools – Vendor’s data extraction tools 5 -15 Copyright © Oracle Corporation, 2002. All rights reserved.
Extraction Methods • Logical Extraction methods: – Full Extraction – Incremental Extraction • Physical Extraction methods: – Online Extraction – Offline Extraction 5 -17 Copyright © Oracle Corporation, 2002. All rights reserved.
Designing Extraction Processes • Analysis: – Sources, technologies – Data types, quality, owners • Design options: – Manual, custom, gateway, third-party – Replication, full, or delta refresh • Design issues: – Volume and consistency of data – Automation, skills needed, resources 5 -19 Copyright © Oracle Corporation, 2002. All rights reserved.
Maintaining Extraction Metadata • • 5 -21 Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data Copyright © Oracle Corporation, 2002. All rights reserved.
Extraction Tools 5 -22 Copyright © Oracle Corporation, 2002. All rights reserved.
Selection Criteria • • • 5 -23 Base functionality Interface features Metadata repository Open API Metadata access Repository utilities Input and output processing Cleansing, reformatting, and auditing References Training requirements Copyright © Oracle Corporation, 2002. All rights reserved.
Possible ETL Failures • • 5 -24 A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation Copyright © Oracle Corporation, 2002. All rights reserved.
Maintaining ETL Quality • ETL must be: – Tested – Documented – Monitored and reviewed • 5 -26 Disparate metadata must be coordinated. Copyright © Oracle Corporation, 2002. All rights reserved.
Oracle’s Solution for ETL • • Oracle’s ETL tool : OWB Oracle’s Utilities – SQL* Loader – Export/ Import • • 5 -27 Oracle 9 i enhanced features for ETL Oracle 9 i support for the current trends in data warehousing Copyright © Oracle Corporation, 2002. All rights reserved.
Oracle’s Solution for ETL: Oracle 9 i Streams, Replication, and Message Queuing Oracle 9 i Streams: • Key information sharing technology • Provides greater functionality and flexibility • Can be used for replication, message queuing, and data warehouse loading 5 -29 Copyright © Oracle Corporation, 2002. All rights reserved.
5 -30 Copyright © Oracle Corporation, 2002. All rights reserved.
Frontier Airways: A Business Scenario Frontier Airways Frontier Asia Pacific 5 -31 Frontier America Frontier Inter Continent Copyright © Oracle Corporation, 2002. All rights reserved.
Summary In this lesson, you should have learned how to: • Outline the ETL (Extraction, Transformation, and Loading) processes for building a data warehouse • Identify ETL tasks, importance, and cost • Explain how to examine data sources • Identify extraction techniques and methods • Identify analysis issues and design options for extraction processes • List the selection criteria for the ETL tools • Identify Oracle’s solution for ETL process 5 -33 Copyright © Oracle Corporation, 2002. All rights reserved.
Practice 5 -1 Overview This practice covers the following topics: • Answering a series of short questions • Answering questions based on the business scenario for Frontier Airways 5 -34 Copyright © Oracle Corporation, 2002. All rights reserved.
- Slides: 25