7 Building the Data Warehouse Loading Warehouse Data

7 Building the Data Warehouse: Loading Warehouse Data Copyright © Oracle Corporation, 2002. All rights reserved.

Objectives After completing this lesson, you should be able to do the following: • Explain key concepts in loading warehouse data • Outline how to build the loading process for the initial load • Identify loading techniques • Describe the loading techniques provided by Oracle • Identify the tasks that take place after data is loaded • Explain the issues involved in designing the transportation, loading, and scheduling processes 7 -2 Copyright © Oracle Corporation, 2002. All rights reserved.

Loading Data into the Warehouse • • Loading moves the data into the warehouse Loading can be time-consuming: – Consider the load window – Schedule and automate the loading • • Initial load moves large volumes of data Subsequent refresh moves smaller volumes of data Transform Extract Operational databases 7 -3 Transport, Load Staging area Warehouse database Copyright © Oracle Corporation, 2002. All rights reserved.

Initial Load and Refresh Initial Load: • Single event that populates the database with historical data • Involves large volumes of data • Employs distinct ETL tasks • Involves large amounts of processing after load Refresh: • Performed according to a business cycle • Less data to load than first-time load • Less-complex ETL tasks • Smaller amounts of post-load processing 7 -5 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Refresh Models: Extract Processing Environment • • After each time interval, build a new snapshot of the database. Purge old snap shots. Operational databases T 1 7 -7 T 2 T 3 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Refresh Models: Warehouse Processing Environment • • • Build a new database. After each time interval, add changes to database. Archive or purge oldest data. Operational databases T 1 7 -8 T 2 T 3 Copyright © Oracle Corporation, 2002. All rights reserved.

Building the Loading Process • • 7 -9 Techniques and tools File transfer methods The load window Time window for other tasks First-time and refresh volumes Frequency of the refresh cycle Connectivity bandwidth Copyright © Oracle Corporation, 2002. All rights reserved.

Building the Loading Process • • • 7 -10 Test the proposed technique Document proposed load Monitor, review, and revise Copyright © Oracle Corporation, 2002. All rights reserved.

Data Granularity • • Important design and operational issue Low-level grain: Expensive, high level of processing, more disk space, more details • High-level grain: Cheaper, less processing, less disk space, little details 7 -11 Copyright © Oracle Corporation, 2002. All rights reserved.

Loading Techniques • • 7 -12 Tools Utilities and 3 GL Gateways Customized copy programs Replication FTP Manual Copyright © Oracle Corporation, 2002. All rights reserved.

Loading Technique Considerations • • • Tools are comprehensive, but costly. Data-movement utilities are fast and powerful. Gateways are suitable for specific instances: – – • • 7 -14 Access other databases Supply dependent data marts Support a distributed environment Provide real-time access if needed Use customized programs as a last resort. Replication is limited by data-transfer rates. Copyright © Oracle Corporation, 2002. All rights reserved.

Loading Techniques Provided by Oracle: SQL*Loader Control file Input files Log files SQL*Loader Bad files Discard files 7 -16 Copyright © Oracle Corporation, 2002. All rights reserved.

Loading Techniques Provided by Oracle • • • OCI and direct-path APIs Export/Import External tables Load utility 7 -18 Copyright © Oracle Corporation, 2002. All rights reserved.

Transportable Tablespaces • • The fastest way for moving data between two Oracle databases Bypass the unload and reload steps Provide a mechanism for transporting data along with metadata Useful for moving data from: – Staging database to a data warehouse – Data warehouse to data marts 7 -20 Copyright © Oracle Corporation, 2002. All rights reserved.

Post-Processing of Loaded Data Transform Extract Load Staging area Create indexes Warehouse Generate keys Post-processing of loaded data Summarize 7 -21 Filter Copyright © Oracle Corporation, 2002. All rights reserved.

Indexing Data • Before load: Enable indexes at server • During load: Adds time to load window, row-by-row approach • After load: Adds time to load window, but faster than row-byrow approach Index Operational databases 7 -23 Staging area Warehouse database Copyright © Oracle Corporation, 2002. All rights reserved.

Unique Indexes • • • Disable constraints before load. Enable constraints after load. Re-create index if necessary. Disable constraints Enable constraints Load data 7 -24 Create index Catch errors Copyright © Oracle Corporation, 2002. All rights reserved. Reprocess

Creating Derived Keys • • The use of derived or generalized keys is recommended to maintain the uniqueness of a row. Methods: – Concatenate operational key with a number – Assign a number sequentially from a list 7 -25 109908 01 109908 100 Copyright © Oracle Corporation, 2002. All rights reserved.

Summary Management • • Summary tables Materialized views Summary data 7 -27 Copyright © Oracle Corporation, 2002. All rights reserved.

Summary Management in Oracle 9 i • Materialized views: – – • Precompute aggregates and joins Results are stored in the database Use query rewrite Improve query performance DBMS_OLAP Summary Advisor: – Collection of functions and procedures (DBMS_OLAP package) – Helps in defining and analyzing materialized views • 7 -28 Many enhanced features in Oracle 9 i Copyright © Oracle Corporation, 2002. All rights reserved.

Filtering Data • • • From warehouse to data marts CTAS p. CTAS Summary data Warehouse Data marts 7 -30 Copyright © Oracle Corporation, 2002. All rights reserved.

Verifying Data Integrity • • Load data into intermediate file. Compare target flash totals with totals before load. Counts & Amounts = Flash Totals Load 7 -31 Intermediate file Target Counts & Amounts Flash Totals = Preserve, inspect, fix, then load Copyright © Oracle Corporation, 2002. All rights reserved.

Steps for Verifying Data Integrity Source files Control Target 3 4 1 Extract SQL*Loader 5 2 6 7. log 7 -32 Copyright © Oracle Corporation, 2002. All rights reserved. . bad

Standard Quality Assurance Checks • • Load status Completion of the process Completeness of the data Data reconciliation Referential integrity violations Reprocessing Comparison of counts and amounts 1+1=3 7 -34 Copyright © Oracle Corporation, 2002. All rights reserved.

Summary In this lesson, you should have learned how to: • Explain key concepts in loading data into the warehouse • Outline how to build the loading process for the initial load • Identify loading techniques • Describe the loading techniques provided by Oracle • Identify the tasks that take place after data is loaded • Explain the issues involved in designing the transportation, loading, and scheduling processes 7 -35 Copyright © Oracle Corporation, 2002. All rights reserved.

Practice 7 -1 Overview This practice covers the following topics: • Answering a series of short questions • Answering questions based on the business scenario for Frontier Airways 7 -36 Copyright © Oracle Corporation, 2002. All rights reserved.
- Slides: 26