Populating Data Warehouse Structures Examining the Star Schema
- Slides: 25
Populating Data Warehouse Structures
Examining the Star Schema Sales Star Schema Fact Table Dimension Tables Dimension Table
Implementing the Star Schema 1. Extract Data From Multiple Sources 2. Integrate, Transform, and Restructure Data 3. Load Data Into Dimension Tables and Fact Tables
The Star Schema Data Load Heterogeneous Data Sources Polaris Data Warehouse Staging Area Northwind OLTP Financial External Internal Files Sales Star DTS External Files Extracting Data From DTS Transforming DTS Heterogeneous Sources Data Inventory Star Loading the Star Schema
Verifying the Dimension Source Data Verifying Accuracy of Source Data n Integrating data from multiple sources n Applying business rules n Checking structural requirements Correcting Invalid Data Managing Invalid Data l Transforming data l Rejecting invalid data l Reassigning data values l Saving invalid data to a log
Dimension Data Load Examples: buyer_name reg_id Barr, Adam 2 Chai, Sean 4 O’Melia, Erin 6. . . buyer_code buyer_last reg_id Barr 2 A 123 Chai 4 B 456 O’Melia 6. . buyer_name Barr, Adam Chai, Sean reg_id II IV buyer_name Smith, Jane Paper, Anne reg_id 2 4 DTS DTS buyer_first Adam Sean Erin. . . buyer_code U 999 A 123 B 456. . . buyer_last reg_id Barr 2 Chai 4 O’Melia 6. . . buyer_name Barr, Adam Chai, Sean Smith, Jane Paper, Anne reg_id 2 4
Maintaining Integrity of the Dimension n n Assigning a Surrogate Key to Each Record l Defines the dimension’s primary key l Relates to the foreign key fields of the fact table Loading One Record Per Application Key l Maintains uniqueness in the dimension l Depends on how you manage changing dimension data l Maintains integrity of the fact table
Managing Changing Dimension Data n n Dimensions with Changing Column Values l Inserts of new data l Updates of existing data Slowly-Changing Dimension Design Solutions l Type 1: Overwrite the dimension record l Type 2: Write another dimension record l Type 3: Add attributes to the dimension record
Type 1: Overwriting the Dimension Slide Product Dimension product key product name product size product package product dept product cat product subcat. . . Before 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks. . . After 001 Rice Puffs 12 Oz oz. Bag Grocery Dry Goods Snacks. . . Existing record is changed
Type 2: Writing Another Dimension Record Product Dimension product key product name product size product package product dept product cat product subcat effective_date … Before 001 Rice Puffs 10 oz. Bag Grocery Dry Goods Snacks 05 -01 -1995. . . 001 Rice Puffs 10 Oz oz. Bag Grocery Dry Goods Snacks 05 -01 -1995. . . After 731 Rice Puffs 12 Oz oz. Bag Grocery Dry Goods Snacks 10 -15 -1998. . . Adds a new record
Type 3: Adding Attributes in the Dimension Record Product Dimension product key product name productsize product package product dept product cat product subcat current product size date previous product size date previous product 2 nd previous product size date. . . Before 001 Rice Puffs 10 10 Oz oz. Bag Grocery Dry Goods Snacks 05 -01 -1995 11 11 Oz oz. 03 -20 -1994 (null). . . After 001 Rice Puffs 12 12 oz. oz Bag Grocery Dry Goods Snacks 10 -15 -1998 10 oz. 05 -01 -1995 11 11 Oz oz. 03 -20 -1994. . . Additional information is stored in an existing record
Verifying the Fact Table Source Data Verifying Accuracy of Source Data n Integrating data from multiple sources n Applying business rules n Checking structural requirements Correcting Invalid Data Managing Invalid Data l Transforming data l Rejecting invalid data l Reassigning data values l Saving invalid data to a log
Assigning Foreign Keys Dimension Tables customer_dim 201 ALFI Alfreds Source Data customer id ALFI product id order date quantity_sales amount_sales 123 1/1/2000 400 10, 789 product_dim 25 123 Chai Sales Fact Data cust_key time_dim 134 1/1/2000 201 prod_key time_key 123 25 1/1/2000 134 quantity_sales amount_sales 400 10, 789
Defining Measures n n customer_id VINET ALFI HANAR. . . Loading Measures from the Source System Calculating Additional Measures customer_key 100 238 437. . . product_id 9 GZ 1 KJ 0 ZA. . . price. 55 1. 10. 98. . . qty 32 48 9. . . Source System Data product_key 512 207 338. . . qty 32 48 9. . . Fact Table Data total_sales 17. 60 52. 80 8. 82. . .
Maintaining Data Integrity n n Adhering to the Fact Table Grain l A fact table can only have one grain l You must load a fact table with data at the same level of detail as defined by the grain Enforcing Column Constraints l NOT NULL constraints l FOREIGN KEY constraints
Implementing Staging Tables n Centralize and Integrate Source Data n Break Up Complex Data Transformations n Facilitate Error Recovery market_stage shipments_stage Staging Area sales_stage inventory_stage
DTS Functionality n Accessing Heterogeneous Data Sources n Importing, Exporting, and Transforming Data n Creating Reusable Transformations and Functions n Automating Data Loads n Managing Metadata n Customizing and Extending Functionality
Defining DTS Packages n Identifies Data Sources and Destinations n Defines Tasks or Actions n Implements Transformation Logic n Defines Order of Operations
Identifying Package Components n Connections Access Data Sources and Destinations n Tasks Describe Data Transformations or Functions n Steps Define the Order of Task Operations or Workflow n Global Variables Store Data that Can Be Shared Across Tasks
Creating Packages n n n Using the DTS Import / Export Wizard l Perform ad-hoc table and data transfers l Develop a prototype package Using DTS Package Designer l Edit packages created with the DTS Import/Export Wizard l Create packages with a wide range of functionality Programming DTS Applications l Directly access the functionality of the DTS Object Model l Requires Microsoft Visual Basic or Microsoft Visual C++
Using DTS to Populate the Sales Star n Populating the Sales Star Dimensions n Populating the Sales Star Fact Table
Populating the Sales Star Dimensions Product Tab Delimited Files product_dim DTS customer_dim Northwind OLTP SQL Server Stored Procedure DTS time_dim DTS
Populating the Sales Star Fact Table Sales Data File product_dim DTS sales_stage DTS customer_dim sales_stage time_dim sales_fact
Designing Modular Packages n n Creating Modular Packages l Simplify complex workflows l Create more readable packages l Produce smaller packages that are easier to debug Using Outer Packages l Execute multiple packages within a single package l Combine modular packages into logical workflows l Reuse modular packages in different workflows l Execute packages in parallel
Using DTS to Populate the Sales Star
- Northwind data warehouse
- Star schema examples
- Star of wonder star of night star of royal beauty bright
- Data preparing exploring examining and displaying
- Healthcare data warehouse model
- What is kdd process in data mining
- Contoh data warehouse dan data mart
- Components of data warehouse
- Perbedaan data warehouse dan data mart
- Introduction to data warehouse
- Arsitektur data mining
- Perbedaan data warehouse dan data mining
- Olap data mining
- What is data acquisition in data warehouse
- Data warehouse vs data mart
- Rolap architecture
- Data warehouse dan data mining
- Data mining dan data warehouse
- What is sociology perspective
- Csi handwriting analysis
- Examining social life
- Examining social life practice
- Three principles of acquiring spiritual knowledge
- What conclusions can you make from examining the geochart?
- How are the whale flipper and the human arm different
- Ao * algorithm