Dimensional Modeling Chapter 2 The Dimensional Data Model

  • Slides: 22
Download presentation
Dimensional Modeling Chapter 2

Dimensional Modeling Chapter 2

The Dimensional Data Model An alternative to the normalized data model l Present information

The Dimensional Data Model An alternative to the normalized data model l Present information as simply as possible (easier to understand) l Return queries as quickly as possible (efficient for queries) l Track the underlying business processes (process focused)

The Dimensional Data Model l l Contains the same information as the normalized model

The Dimensional Data Model l l Contains the same information as the normalized model Has far fewer tables Grouped in coherent business categories Pre-joins hierarchies and lookup tables resulting in fewer join paths and fewer intermediate tables Normalized fact table with denormalized dimension tables.

GB Video E-R Diagram Customer #Cust No F Name L Name Ads 1 Ads

GB Video E-R Diagram Customer #Cust No F Name L Name Ads 1 Ads 2 City State Zip Tel No CC No Expire Requestor of Rental #Rental No Date Clerk No Pay Type CC No Expire CC Approval Owner of Line #Line No Due Date Return Date OD charge Pay type Holder of Name for Title #Title No Name Vendor No Cost Video #Video No One-day fee Extra days Weekend

GB Video Data Mart Customer Cust. ID Cust No F Name L Name Address.

GB Video Data Mart Customer Cust. ID Cust No F Name L Name Address. ID Adddress 1 Address 2 City State Zip Area. Code Phone Rental. ID Rental No Clerk No Store Pay Type Line. ID OD Charge One. Day. Charge Extra. Days. Charge Weekend. Charge Days. Reserved Days. Overdue Cust. ID Address. ID Rental. Id Video. ID Title. ID Rental. Date. ID Due. Date. ID Return. Date. ID Rental Date Rental. Date. IDDue Date Return Date SQLDate Due. Date. ID Day SQLDate. Return. Date. ID SQLDate Week Day Quarter Week Day Holiday Quarter Week Holiday Quarter Holiday Video. ID Video No Title. ID Title. No Name Cost Vendor Name

Fact Table Measurements associated with a specific business process l Grain: level of detail

Fact Table Measurements associated with a specific business process l Grain: level of detail of the table l Process events produce fact records l Facts (attributes) are usually • • l l l Numeric Additive Derived facts included Foreign (surrogate) keys refer to dimension tables (entities) Classification values help define subsets

Dimension Tables Entities describing the objects of the process l Conformed dimensions cross processes

Dimension Tables Entities describing the objects of the process l Conformed dimensions cross processes l Attributes are descriptive • • l l l Text Numeric Surrogate keys Less volatile than facts (1: m with the fact table) Null entries Date dimensions Produce “by” questions

Bus Architecture l l An architecture that permits aggregating data across multiple marts Conformed

Bus Architecture l l An architecture that permits aggregating data across multiple marts Conformed dimensions and attributes Drill Down vs. Drill Across Bus matrix

Keys and Surrogate Keys l l l A surrogate key is a unique identifier

Keys and Surrogate Keys l l l A surrogate key is a unique identifier for data warehouse records that replaces source primary keys (business/natural keys) Protect against changes in source systems Allow integration from multiple sources Enable rows that do not exist in source data Track changes over time (e. g. new customer instances when addresses change) Replace text keys with integers for efficiency

Slowly Changing Dimensions Attributes in a dimension that change more slowly than the fact

Slowly Changing Dimensions Attributes in a dimension that change more slowly than the fact granularity l Type 1: Current only l Type 2: All history l Type 3: Most recent few (rare) Note: rapidly changing dimensions usually indicate the presence of a business process that should be tracked as a separate dimension or as a fact table

Cust. Key BKCust. ID Cust. Name Comm. Dist Gender Hom. Own? 1552 31421 Jane

Cust. Key BKCust. ID Cust. Name Comm. Dist Gender Hom. Own? 1552 31421 Jane Rider 3 F N Fact Table Date Cust. Key Prod. Key Item Count Amount 1/7/2004 1552 95 1 1, 798. 00 3/2/2004 1552 37 1 27. 95 5/7/2005 1552 87 2 320. 26 2/21/2006 1552 2387 42 1 19. 95 Dimension with a slowly changing attribute Cust Key BKCust ID Cust Name Comm Dist Gender Hom Own? Eff End 1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006 2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999

Date Dimensions l l l One row for every day for which you expect

Date Dimensions l l l One row for every day for which you expect to have data for the fact table (perhaps generated in a spreadsheet and imported) Usually use a meaningful integer surrogate key (such as yyyymmdd 20060926 for Sep. 26, 2006). Note: this order sorts correctly. Include rows for missing or future dates to be added later.

Degenerate Dimensions l l Dimensions without attributes. (Such as a transaction number or order

Degenerate Dimensions l l Dimensions without attributes. (Such as a transaction number or order number. ) Put the attribute value into the fact table even though it is not an additive fact.

Snowflaking (Outrigger Dimensions or Reference Dimensions) l l Connects entities to dimension tables rather

Snowflaking (Outrigger Dimensions or Reference Dimensions) l l Connects entities to dimension tables rather than the fact table Complicates coding and requires additional processing for retrievals Makes type 2 slowly changing dimensions harder to maintain Useful for seldom used lookups

M: N Multivalued Dimensions l l l Fact to Dimension Try to avoid these.

M: N Multivalued Dimensions l l l Fact to Dimension Try to avoid these. Solutions can be very misleading.

Multivalued Dimensions SALESREP Sales. Rep. Key Name Address SALESREP-ORDER-BRIDGE Sales. Rep. Key Salesrep. Group.

Multivalued Dimensions SALESREP Sales. Rep. Key Name Address SALESREP-ORDER-BRIDGE Sales. Rep. Key Salesrep. Group. Key Weight= (1/Num. Reps) ORDERS (FACT) Sales. Rep. Key Product. Key Sales. Rep. Grp. Key Customer. Key Order. Qty

Hierarchies Group data within dimensions: Sales. Rep • Region • State • County •

Hierarchies Group data within dimensions: Sales. Rep • Region • State • County • Neighborhood Problem structures • Variable depth • Frequently changing

Heterogeneous Products l Several different kinds of entry with different attributes for each •

Heterogeneous Products l Several different kinds of entry with different attributes for each • (The sub-class problem)

Aggregate Dimensions l Dimensions that represent data at different levels of granularity • Remove

Aggregate Dimensions l Dimensions that represent data at different levels of granularity • Remove a dimension • Roll up the hierarchy (provide a new shrunken dimension with new surr-key that represents rolled up data)

Junk Dimensions l Miscellaneous attributes that don’t belong to another entity, usually representing processing

Junk Dimensions l Miscellaneous attributes that don’t belong to another entity, usually representing processing levels • Flags • Categories • Types

Fact Tables l Transaction • Track processes at discrete points in time when they

Fact Tables l Transaction • Track processes at discrete points in time when they occur l Periodic snapshot • Cumulative performance over specific time intervals l Accumulating snapshot • Constantly updated over time. May include multiple dates representing stages.

Aggregates l Precalculated summary tables • Improve performance • Record data an coarser granularity

Aggregates l Precalculated summary tables • Improve performance • Record data an coarser granularity