10 Building the Warehouse Copyright Oracle Corporation 1999
10 Building the Warehouse Copyright Ó Oracle Corporation, 1999. All rights reserved.
Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Choosing a Computing Architecture Meeting a Business Need Modeling the Data Warehouse Analyzing User Query Needs Planning Warehouse Storage ETT (Building the Warehouse) Supporting End User Access Project Management (Methodology, Maintaining Metadata) 10 -2 Copyright Ó Oracle Corporation, 1999. All rights reserved. Managing the Data Warehouse
Objectives After completing this lesson, you should be able to do the following: • Outline the extraction, transformation, and transportation processes for building a data warehouse • • Identify extraction issues 10 -3 Explain how to examine data sources Identify extraction techniques List tools that can be used to extract data from sources Copyright Ó Oracle Corporation, 1999. All rights reserved.
Extraction/Transformation/Transportation Processes (ETT) • • • Extract source data Transform/clean data Index and summarize http: // Hollywood a rec orof as Browser: + Load data into WH Detect changes Refresh data ETT Browser: Cu http: // s Hollywood tom er+s X : X • • • Cu s Browser: http: // tomer+ X s: Hollywood Programs Gateways Operational systems 10 -4 Tools Copyright Ó Oracle Corporation, 1999. All rights reserved. Warehouse
ETT Processes • Must result in data that is relevant, useful, highquality, accurate, and accessible • Require a large proportion of warehouse development time and resources ETT Relevant Useful Clean up Quality Consolidate Operational systems 10 -5 Restructure Warehouse Copyright Ó Oracle Corporation, 1999. All rights reserved. Accurate Accessible
Data Staging Area • • • The construction site for the warehouse • Frequently configured as multitier staging Required by most implementations Composed of ODS, flat files, or relational server tables Operational system 10 -6 Extract Data staging area Transport (Load) Transform Copyright Ó Oracle Corporation, 1999. All rights reserved. Warehouse
Remote Staging Model Data staging area within the warehouse environment Warehouse environment Oper. envt. Operational system Extract, transform, transport Data staging area Transport Transform (Load) Warehouse Data staging area in its own environment, avoiding negative impact on the warehouse environment Oper. envt. Operational system 10 -7 Staging envt. Warehouse envt. Data staging area Transport Extract, (Load) transform, Transform transport Copyright Ó Oracle Corporation, 1999. All rights reserved. Warehouse
Onsite Staging Model Data staging area within the operational environment, possibly affecting the operational system WH envt. Operational environment Operational system Extract Data staging area Transform 10 -8 Transport (Load) Copyright Ó Oracle Corporation, 1999. All rights reserved. Warehouse
Extracting Data mapping Browser: Cu http: // sto me + X rs: orof Hollywood Cu s Browser: http: // tomer+ X s: Hollywood as http: // + a rec Browser: X Transform Operational databases • • • 10 -9 Data staging area Warehouse database Routines developed to select fields from source Various data formats Rules, audit trails, error correction facilities Copyright Ó Oracle Corporation, 1999. All rights reserved.
Source Systems Browser: http: // Hollywood sto me rs: + X Archive http: // X + Hollywood Browser: Cus http: // Hollywood tom ers + : X as Browser: orof Production a rec • • Cu Internal External 10 -10 12345. 00 100% ABC CO 12780. 00 110% 2345787. 00 230% GMBH LTD GBUK INC 87877. 98 200% FFR ASSOC 5678. 00 -10% MCD CO Copyright Ó Oracle Corporation, 1999. All rights reserved.
Production Data Hollywood IMS Browser: http: // X + Hollywood tom ers+ : X C Browser: ust o http: // me rs+: Hollywood a rec orof DB 2 X as Browser: http: //Cu s VSAM Non. Stop SQL Oracle Dun and Bradstreet Financials Oracle Financials Rdb 10 -11 Shared Medical Systems Hogan Financials Sybase • • SAP Operating system platforms Hardware platforms File systems Database systems and vertical applications Copyright Ó Oracle Corporation, 1999. All rights reserved.
Archive Data Operational databases • • Warehouse database Historical data Useful for analysis over long periods of time Useful for first-time load May require unique transformations 10 -12 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Internal Data Planning Marketing Accounting • • 12345. 00 100% ABC CO 12780. 00 110% GMBH LTD 2345787. 00 230% GBUK INC 87877. 98 200% FFR ASSOC 5678. 00 -10% MCD CO Warehouse database Planning, sales, and marketing organization data Maintained by: – Spreadsheets (structured) – Documents (unstructured) • Treated like any other source data 10 -13 Copyright Ó Oracle Corporation, 1999. All rights reserved.
External Data A. C. Nielsen, IRI, IMS, Walsh America Purchased Competitive databases information Dun and Bradstreet Barron's • • • 10 -14 Economic forecasts Warehousing databases Wall Street Journal Information from outside the organization Issues of frequency, format, and predictability Described and tracked using metadata Copyright Ó Oracle Corporation, 1999. All rights reserved.
Mapping • • Defines which operational attributes to use • • Defines where the attributes exist in the warehouse Defines how to transform the attributes for the warehouse Mapping tools are available Metadata File A F 1 F 2 F 3 10 -15 123 Bloggs 10/12/56 Staging File One Number Name DOB Staging File One Number USA 123 Name Mr. Bloggs DOB 10 -Dec-56 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Extraction Techniques • • Programs: C, COBOL, PL/SQL Gateways: transparent database access In-house development is popular Tools – High initial cost – Ongoing automation – Data cleanup 10 -16 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Sources and Targets Sources ODS Warehouse Access Data marts Data analysis Data mining OLAP 10 -17 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Designing Extraction Processes • Analysis: – Sources, technologies – Data types, quality, owners • Design options: – Manual, custom, gateway, third-party – Replication, full, or delta refresh • Design issues: – Batch window, volumes, data currency – Automation, skills needed, resources 10 -18 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Maintaining Extraction Metadata • • Source location, type, structure Access method Privilege information Temporary storage Failure procedures Validity checks Handlers for missing data 10 -19 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Possible ETT Failures • • A missing source file A system failure Inadequate metadata Poor mapping information Inadequate storage planning A source structural change No contingency plan Inadequate data validation 10 -20 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Maintaining ETT Quality • ETT must be: – Tested – Documented – Monitored and reviewed • Disparate metadata must be coordinated 10 -21 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Extraction Tools Map Source Data to Intermediate File Store Sales and Marketing Customer Name Varchar Char 20 Mapping information Unique name JCL files Update metadata 10 -22 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Selection Criteria • • • 10 -23 Base functionality Interface features Metadata repository Open API Metadata access Repository utilities Input and output processing Cleansing, reformatting, and auditing References Training requirements Copyright Ó Oracle Corporation, 1999. All rights reserved.
WTI Partner ETT Tools • • • Carleton Constellar Evolutionary Technologies Informatica Information Builders Oracle EDMS, Toolkits, OADW Prism Solutions Sagent Vality Technology 10 -24 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Summary This lesson discussed the following topics: • ETT processes are essential and consume a large proportion of warehouse resources and time • • The extraction process acquires source data You may encounter many data sources There are many data extraction issues ETT Tools should be considered 10 -25 Copyright Ó Oracle Corporation, 1999. All rights reserved.
Practice 10 -1 Overview This practice covers the following topics: • • Answering a series of short questions Specifying true or false to a series of statements 10 -26 Copyright Ó Oracle Corporation, 1999. All rights reserved.
- Slides: 26