8 Refreshing Warehouse Data Copyright Oracle Corporation 2002





































- Slides: 37
8 Refreshing Warehouse Data Copyright © Oracle Corporation, 2002. All rights reserved.
Objectives After completing this lesson, you should be able to do the following: • Describe methods for capturing changed data • Explain techniques for applying the changes • Describe Change Data Capture mechanism and refresh mechanisms supported in Oracle 9 i • Describe techniques for purging and archiving data and outline the techniques supported by Oracle • Outline final tasks, such as publishing the data, controlling access, and automating processes • List the selection criteria for choosing ETL tools 8 -2 Copyright © Oracle Corporation, 2002. All rights reserved.
Developing a Refresh Strategy for Capturing Changed Data • • • Consider load window. Identify data volumes. Identify cycle. Know the technical infrastructure. Plan a staging area. Determine how to detect changes. Operational databases T 1 8 -3 T 2 T 3 Copyright © Oracle Corporation, 2002. All rights reserved.
User Requirements and Assistance • • Users define the refresh cycle. IT balances requirements against technical issues. Document all tasks and processes. Employ user skills. Operational databases T 1 8 -4 T 2 T 3 Copyright © Oracle Corporation, 2002. All rights reserved.
Load Window Requirements • • • Time available for entire ETL process Plan Test Prove Monitor Load Window 0 8 -5 3 am 6 User Access Period Load Window 9 12 pm 3 6 Copyright © Oracle Corporation, 2002. All rights reserved. 9 12
Planning the Load Window • • • Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first. High availability requirements may mean a small load window. User Access Period 0 8 -6 3 am 6 9 12 pm 3 6 Copyright © Oracle Corporation, 2002. All rights reserved. 9 12
Scheduling the Load Window 1 Receive data FTP 0 8 -7 2 Requirements Load cycle Control File names File types Number of files Number of loads First-time load or refresh Date of file Date range Records in file - counts Totals – amounts 4 Open and read files to verify and analyze Control process Copyright © Oracle Corporation, 2002. All rights reserved. 3 3 a. m.
Scheduling the Load Window 5 Load into warehouse 6 8 Verify, analyze, reapply Create summaries 7 Index data 9 Update metadata Parallel load 3 a. m. 8 -9 6 a. m. Copyright © Oracle Corporation, 2002. All rights reserved. 9 a. m.
Scheduling the Load Window 11 Create views for specialized tools 10 Backup warehouse 6 a. m. 8 -10 12 Users access summary data 13 Publish 9 a. m. Copyright © Oracle Corporation, 2002. All rights reserved. User access
Capturing Changed Data for Refresh • • Capture new fact data Capture changed dimension data Determine method of capture in each case Methods: – – – 8 -11 Wholesale data replacement Comparison of database instances Time stamping Database triggers Database log Copyright © Oracle Corporation, 2002. All rights reserved.
Wholesale Data Replacement • • • 8 -13 Expensive Useful for data marts with less data Limited historical data analysis is possible Time period often exceeds load window Mirroring techniques can be used to provide access to the users Copyright © Oracle Corporation, 2002. All rights reserved.
Comparison of Database Instances • Delta file: – Changes to operational data since last refresh – Used to update the warehouse • • Simple to perform, but expensive in terms of time and processing Efficient for smaller volumes of data Yesterday’s operational database Today’s operational database 8 -14 Database comparison Delta file holds changed data Copyright © Oracle Corporation, 2002. All rights reserved.
Time and Date Stamping • • • Fast scanning for records changed since last refresh cycle Useful for data with updated date field No detection of deleted data Operational data 8 -15 Delta file holds changed data based on time stamp Copyright © Oracle Corporation, 2002. All rights reserved.
Database Triggers • • • Changed data intersected at the server level Extra I/O required Maintenance overhead Operational data Operational server (RDBMS) Delta file holds changed data Triggers on server 8 -16 Copyright © Oracle Corporation, 2002. All rights reserved.
Using a Database Log • • • Contains before and after images Requires system checkpoint Common technique Operational data Operational server (DBMS) Log analysis and data extraction Delta file holds changed data 8 -17 Copyright © Oracle Corporation, 2002. All rights reserved.
Choosing a Method for Change Data Capture • • • 8 -18 Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, operational, and application issues. Copyright © Oracle Corporation, 2002. All rights reserved.
Change Data Capture Mechanism in Oracle 9 i • • Facilitates incremental extraction Captures all the INSERTs, UPDATEs, and DELETEs Changes to data are stored in change tables Based on Publish and Subscribe model Data Source table 8 -19 Publisher Change data Change table Copyright © Oracle Corporation, 2002. All rights reserved.
Change Data Capture Mechanism in Oracle 9 i Subscriber view 1 Subscriber view 2 Subscriber 1 Subscriber 2 Change table 8 -20 Copyright © Oracle Corporation, 2002. All rights reserved.
Refresh Mechanisms in Oracle 9 i • Refresh modes for materialized views: – ON COMMIT – ON DEMAND: DBMS_MVIEW. REFRESH_ALL_MVIEWS DBMS_MVIEW. REFRESH_DEPENDENT • Refresh options for materialized views: – COMPLETE – FAST – FORCE • 8 -22 Use partitioning for refresh. Copyright © Oracle Corporation, 2002. All rights reserved.
Applying the Changes to Data You have a choice of techniques: • Overwrite a record • Add a field • Maintain history • Add version numbers 8 -24 Copyright © Oracle Corporation, 2002. All rights reserved.
Overwriting a Record • • • Easy to implement Loses all history Not recommended 42135 8 -25 John Doe Married 42135 John Doe Single Copyright © Oracle Corporation, 2002. All rights reserved.
Adding a New Record • • History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys. 42135_01 8 -26 John Doe Single Married Copyright © Oracle Corporation, 2002. All rights reserved.
Adding a Current Field • • • 8 -27 Maintains some history Loses intermediate values Is enhanced by adding an Effective Date field 42135 John Doe Single Married Copyright © Oracle Corporation, 2002. All rights reserved. 1 -Jan-01
Limitations of Methods for Applying Changes • • • 8 -28 Difficult to maintain History Dimensions may grow large Maintenance overhead Copyright © Oracle Corporation, 2002. All rights reserved.
Maintaining History: Techniques • • 8 -30 History tables One-to-many relationships Versioning Preserve complete history Copyright © Oracle Corporation, 2002. All rights reserved.
Maintaining History: Techniques History tables: • Normalize dimensions • Hold current and historical data One-to-many relationships: • One current record and many history records HIST_CUST Time CUSTOMER Sales Product 8 -31 Copyright © Oracle Corporation, 2002. All rights reserved.
Versioning • • Avoid double counting Facts hold version number Time 8 -33 Customer. Cust. Id Version Customer Name 1234 1 Comer 1234 2 Comer Sales. Cust. Id Version Sales Facts 1234 1 $11, 000 1234 2 $12, 000 Customer Sales Copyright © Oracle Corporation, 2002. All rights reserved. Product
Preserve Complete History • Complete history: – Enables realistic historical analysis – Retains context of data • Model must be able to: – Reflect business changes – Maintain context between fact and dimension data – Retain sufficient data to relate old to new 8 -34 Copyright © Oracle Corporation, 2002. All rights reserved.
Purging and Archiving Data • • As data ages, its value depreciates. Remove old data from the warehouse: – Archive for later use (if needed) – Purge without copy 8 -35 Copyright © Oracle Corporation, 2002. All rights reserved.
Oracle Supported Techniques for Purging Data • Using SQL: – TRUNCATE TABLE: Retains no rollback – DELETE: Retains redo and rollback – ALTER TABLE. . . DROP PARTITION: Removes a partition • Using PL/SQL: – Database triggers 8 -36 Copyright © Oracle Corporation, 2002. All rights reserved.
Oracle Supported Techniques for Archiving Data • • Export and Import utilities ALTER TABLE. . . EXCHANGE PARTITION EXP IMP Database 8 -38 . dmp Copyright © Oracle Corporation, 2002. All rights reserved.
Final Tasks • • • Update metadata Publish data Use database roles to control access to the warehouse Sources Stage Rules Publish Extract Transform Load Query 8 -39 Copyright © Oracle Corporation, 2002. All rights reserved.
Publishing Data • • • Control access using database roles Compromise between load action and user access Consider: – Staggering updates – Using temporary tables – Using separate tables 8 -41 Copyright © Oracle Corporation, 2002. All rights reserved.
ETL Tools: Selection Criteria • • • 8 -42 Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code Copyright © Oracle Corporation, 2002. All rights reserved.
ETL Tool Selection Criteria • • • 8 -44 Activity scheduling and sophistication Metadata generation Learning curve Flexibility Supported operating systems Cost Copyright © Oracle Corporation, 2002. All rights reserved.
Summary In this lesson, you should have learned how to: • Describe methods for capturing changed data • Explain techniques for applying the changes • Describe Change Data Capture mechanism and refresh mechanisms supported in Oracle 9 i • Describe techniques for purging and archiving data and outline the techniques supported by Oracle • Outline final tasks, such as publishing the data, controlling access, and automating processes • List the selection criteria for choosing ETL tools 8 -45 Copyright © Oracle Corporation, 2002. All rights reserved.
Practice 8 -1 Overview This practice covers the following topics: • Answering a series of questions based on the business scenario for Frontier Airways • Answering a series of short questions 8 -46 Copyright © Oracle Corporation, 2002. All rights reserved.