Populating Data Warehouse Structures Examining the Star Schema

  • Slides: 25
Download presentation
Populating Data Warehouse Structures

Populating Data Warehouse Structures

Examining the Star Schema Sales Star Schema Fact Table Dimension Tables Dimension Table

Examining the Star Schema Sales Star Schema Fact Table Dimension Tables Dimension Table

Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and

Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables

The Star Schema Data Load Heterogeneous Data Sources Polaris Data Warehouse Staging Area Northwind

The Star Schema Data Load Heterogeneous Data Sources Polaris Data Warehouse Staging Area Northwind OLTP Financial External Internal Files Sales Star DTS External Files Extracting Data From DTS Transforming DTS Heterogeneous Sources Data Inventory Star Loading the Star Schema

Verifying the Dimension Source Data Verifying Accuracy of Source Data n Integrating data from

Verifying the Dimension Source Data Verifying Accuracy of Source Data n Integrating data from multiple sources n Applying business rules n Checking structural requirements Correcting Invalid Data Managing Invalid Data l Transforming data l Rejecting invalid data l Reassigning data values l Saving invalid data to a log

Dimension Data Load Examples: buyer_name reg_id Barr, Adam 2 Chai, Sean 4 O’Melia, Erin

Dimension Data Load Examples: buyer_name reg_id Barr, Adam 2 Chai, Sean 4 O’Melia, Erin 6. . . buyer_code buyer_last reg_id Barr 2 A 123 Chai 4 B 456 O’Melia 6. . buyer_name Barr, Adam Chai, Sean reg_id II IV buyer_name Smith, Jane Paper, Anne reg_id 2 4 DTS DTS buyer_first Adam Sean Erin. . . buyer_code U 999 A 123 B 456. . . buyer_last reg_id Barr 2 Chai 4 O’Melia 6. . . buyer_name Barr, Adam Chai, Sean Smith, Jane Paper, Anne reg_id 2 4

Maintaining Integrity of the Dimension n n Assigning a Surrogate Key to Each Record

Maintaining Integrity of the Dimension n n Assigning a Surrogate Key to Each Record l Defines the dimension’s primary key l Relates to the foreign key fields of the fact table Loading One Record Per Application Key l Maintains uniqueness in the dimension l Depends on how you manage changing dimension data l Maintains integrity of the fact table

Managing Changing Dimension Data n n Dimensions with Changing Column Values l Inserts of

Managing Changing Dimension Data n n Dimensions with Changing Column Values l Inserts of new data l Updates of existing data Slowly-Changing Dimension Design Solutions l Type 1: Overwrite the dimension record l Type 2: Write another dimension record l Type 3: Add attributes to the dimension record

Type 1: Overwriting the Dimension Slide Product Dimension product key product name product size

Type 1: Overwriting the Dimension Slide Product Dimension product key product name product size product package product dept product cat product subcat. . . Before 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks. . . After 001 Rice Puffs 12 Oz oz. Bag Grocery Dry Goods Snacks. . . Existing record is changed

Type 2: Writing Another Dimension Record Product Dimension product key product name product size

Type 2: Writing Another Dimension Record Product Dimension product key product name product size product package product dept product cat product subcat effective_date … Before 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05 -01 -1995. . . 001 Rice Puffs 10 Oz oz. Bag Grocery Dry Goods Snacks 05 -01 -1995. . . After 731 Rice Puffs 12 Oz oz. Bag Grocery Dry Goods Snacks 10 -15 -1998. . . Adds a new record

Type 3: Adding Attributes in the Dimension Record Product Dimension product key product name

Type 3: Adding Attributes in the Dimension Record Product Dimension product key product name productsize product package product dept product cat product subcat current product size date previous product size date previous product 2 nd previous product size date. . . Before 001 Rice Puffs 10 10 Oz oz. Bag Grocery Dry Goods Snacks 05 -01 -1995 11 11 Oz oz. 03 -20 -1994 (null). . . After 001 Rice Puffs 12 12 oz. oz Bag Grocery Dry Goods Snacks 10 -15 -1998 10 oz. 05 -01 -1995 11 11 Oz oz. 03 -20 -1994. . . Additional information is stored in an existing record

Verifying the Fact Table Source Data Verifying Accuracy of Source Data n Integrating data

Verifying the Fact Table Source Data Verifying Accuracy of Source Data n Integrating data from multiple sources n Applying business rules n Checking structural requirements Correcting Invalid Data Managing Invalid Data l Transforming data l Rejecting invalid data l Reassigning data values l Saving invalid data to a log

Assigning Foreign Keys Dimension Tables customer_dim 201 ALFI Alfreds Source Data customer id ALFI

Assigning Foreign Keys Dimension Tables customer_dim 201 ALFI Alfreds Source Data customer id ALFI product id order date quantity_sales amount_sales 123 1/1/2000 400 10, 789 product_dim 25 123 Chai Sales Fact Data cust_key time_dim 134 1/1/2000 201 prod_key time_key 123 25 1/1/2000 134 quantity_sales amount_sales 400 10, 789

Defining Measures n n customer_id VINET ALFI HANAR. . . Loading Measures from the

Defining Measures n n customer_id VINET ALFI HANAR. . . Loading Measures from the Source System Calculating Additional Measures customer_key 100 238 437. . . product_id 9 GZ 1 KJ 0 ZA. . . price. 55 1. 10. 98. . . qty 32 48 9. . . Source System Data product_key 512 207 338. . . qty 32 48 9. . . Fact Table Data total_sales 17. 60 52. 80 8. 82. . .

Maintaining Data Integrity n n Adhering to the Fact Table Grain l A fact

Maintaining Data Integrity n n Adhering to the Fact Table Grain l A fact table can only have one grain l You must load a fact table with data at the same level of detail as defined by the grain Enforcing Column Constraints l NOT NULL constraints l FOREIGN KEY constraints

Implementing Staging Tables n Centralize and Integrate Source Data n Break Up Complex Data

Implementing Staging Tables n Centralize and Integrate Source Data n Break Up Complex Data Transformations n Facilitate Error Recovery market_stage shipments_stage Staging Area sales_stage inventory_stage

DTS Functionality n Accessing Heterogeneous Data Sources n Importing, Exporting, and Transforming Data n

DTS Functionality n Accessing Heterogeneous Data Sources n Importing, Exporting, and Transforming Data n Creating Reusable Transformations and Functions n Automating Data Loads n Managing Metadata n Customizing and Extending Functionality

Defining DTS Packages n Identifies Data Sources and Destinations n Defines Tasks or Actions

Defining DTS Packages n Identifies Data Sources and Destinations n Defines Tasks or Actions n Implements Transformation Logic n Defines Order of Operations

Identifying Package Components n Connections Access Data Sources and Destinations n Tasks Describe Data

Identifying Package Components n Connections Access Data Sources and Destinations n Tasks Describe Data Transformations or Functions n Steps Define the Order of Task Operations or Workflow n Global Variables Store Data that Can Be Shared Across Tasks

Creating Packages n n n Using the DTS Import / Export Wizard l Perform

Creating Packages n n n Using the DTS Import / Export Wizard l Perform ad-hoc table and data transfers l Develop a prototype package Using DTS Package Designer l Edit packages created with the DTS Import/Export Wizard l Create packages with a wide range of functionality Programming DTS Applications l Directly access the functionality of the DTS Object Model l Requires Microsoft Visual Basic or Microsoft Visual C++

Using DTS to Populate the Sales Star n Populating the Sales Star Dimensions n

Using DTS to Populate the Sales Star n Populating the Sales Star Dimensions n Populating the Sales Star Fact Table

Populating the Sales Star Dimensions Product Tab Delimited Files product_dim DTS customer_dim Northwind OLTP

Populating the Sales Star Dimensions Product Tab Delimited Files product_dim DTS customer_dim Northwind OLTP SQL Server Stored Procedure DTS time_dim DTS

Populating the Sales Star Fact Table Sales Data File product_dim DTS sales_stage DTS customer_dim

Populating the Sales Star Fact Table Sales Data File product_dim DTS sales_stage DTS customer_dim sales_stage time_dim sales_fact

Designing Modular Packages n n Creating Modular Packages l Simplify complex workflows l Create

Designing Modular Packages n n Creating Modular Packages l Simplify complex workflows l Create more readable packages l Produce smaller packages that are easier to debug Using Outer Packages l Execute multiple packages within a single package l Combine modular packages into logical workflows l Reuse modular packages in different workflows l Execute packages in parallel

Using DTS to Populate the Sales Star

Using DTS to Populate the Sales Star