Kimball Approach Dimensional Modeling Across SQL Server and

Kimball Approach Dimensional Modeling Across SQL Server and SSAS

Acknowledgments • Course materials adapted from. . . • The Microsoft Data Warehouse Toolkit • J. Mundy, W. Thornthwaite (Wiley 2006) • The Data Warehouse Lifecycle Toolkit, 2 nd Ed. • R. Kimball, M. Ross, W. Thornthwaite, J. Mundy, B. Becker (Wiley 2008) • The Data Warehouse Toolkit, 2 nd Ed. • R. Kimball, M. Ross (Wiley 2002) • Kimball University • Course materials • Design Tips and Intelligent Enterprise articles at www. Kimball. University. com 2

Session Agenda • Rational for dimensional modeling • DW/BI Lifecycle context • Key Business Process Dimensional Model concepts • The dimensional modeling process • Design exercise: Session Attendance Model • The Dimensional Model in the RDBMS • The Dimensional Model in Analysis Services

RATIONAL FOR DIMENSIONAL MODELING

The Business Context • Business people need information to make plans and assess results, and this need is growing. • Data is captured by complex systems structured to support specific transaction requirements • Business people find it difficult to get business information from data in transaction systems Therefore, our job is to create a system that will: reliably take data out of the source systems, restructure its form and content as appropriate for business analysis, and provide it to the business people via tools they can actually use. 5

Dimensional: Why and How • Primary design goal: Support analytic queries • Usable • Flexible • Performance • Key terms • Facts = measures of business events • Dimensions = entities that participate in business events • Basic approach: • Denormalize dimensions for usability • Normalize facts for performance 6

THE DW/BI LIFECYCLE CONTEXT

The DW/BI Lifecycle Technical Architecture Design Product Selection & Installation Dimensional Modeling Physical Design Growth Business Project Planning Requirements ETL Design & Development Deployment Definition BI Application Specification BI Application Development Project Management Maintenance

Business Requirements Are the Foundation of Success • The more you focus your efforts on high value, relatively easy, information based business opportunities, the more likely you will be to succeed. • Regardless of which tools you use • Regardless of which database you use • Regardless of your technical skills

The Data Warehouse Bus Matrix is the Enterprise Data Architecture Framework • Matrix of business processes and conformed dimensions

A More Detailed Bus Matrix Example Opportunities - Sales performance - Sales strategy - Sales targeting - Product recommendations - Manufacturing planning - Sales forecast input - Customer prioritization

KEY DIMENSIONAL MODELING CONCEPTS

Designing the Business Process Dimensional Model • • • 13 Basic dimensional modeling concepts Conformed dimensions Slowly changing dimensions Additional concepts The dimensional modeling process

Dimensional: Why and How • Primary design goal: Support analytic queries • Usable • Performance • Key terms • Facts = measures of business events • Dimensions = entities that participate in business events • Basic approach: • Denormalize dimensions for usability • Normalize facts for performance 14

Terminology: Facts • Metrics resulting from business process or event • Facts are usually numeric and additive • Granularity/grain • Identifies the level of detail • One row per sale, one row per service call, one row per claim, … • Atomic grain is most flexible • Three fact table types • Transaction; Snapshot; Accumulating 15 Sales Facts DATE KEY PRODUCT KEY STORE KEY PROMOTION KEY $ Sales Unit Sales

Terminology: Dimensions Product Dimension • Characteristics of a subject/object PRODUCT KEY • Who, what, when, where, why, how • Product, Date, Patient, Facility … Product Desc. SKU # • Each row is an occurrence Size • One row per product, day, patient, … Brand Desc. Class Desc. • Dimension attributes (columns): • Report labels and query constraints • “By” words and “where” clauses • Verbose descriptive attributes, in addition to codes • Hierarchical relationships 16

Terminology: Business Process Dimensional Model (or Star Schema) • Fact table per business process / event, plus relevant dimensions Sales Facts Product • Benefits: • Easier to understand • Better performance • Pre-joined dimensions • Star-join optimization • Extensible to handle change 17 PRODUCT KEY Product Code Product Descr Brand Category Height Width Weight Standard Cost… DATE KEY PRODUCT KEY STORE KEY PROMOTION KEY … Sales Amount Sales Units Dat e. DATE KEY Date Day of Week Calendar Month Calendar Year Fiscal Month… Promotion PROMOTION KEY Store STORE KEY Store Code Store Descr Square Footage Store Type Store Address… Fact Table Dimension Tables Promotion Code Promotion Name Promotion Descr Media Type Ad Type Display Type Coupon Type Promotion Cost. . .

Sample Report Translation of Dimensional Model “Dimensions” Report, row, and column headings, and filters 18 “Facts” Numeric report values

Terminology: Dimension Table Surrogate Keys • Surrogate keys are substitute keys • Integer, non-meaningful, sequence numbers • Surrogate keys join fact and dimension tables • Treat business keys as attributes (aka natural keys) • Benefits • Isolate DW/BI system from operational changes • Improve performance (over character keys) • Handle “Not applicable”, “Date TBD”, … • Allow integration of multiple sources • Enable tracking of dimension attribute changes 19

Terminology: Conformed Dimensions • All fact tables use same standard dimensions Order Facts Product KEY … Order Qty… • Established via Bus Matrix, enforced in Inventory Facts Product KEY … ETL Inventory Qty… • Dimensions are Facts consistent across Shipment. Product KEY … processes Shipment Qty… 20 Product Dim Product KEY Product Attributes

Creating Conformed Dimensions • • Agree on column names and definitions Identify best source Assign surrogate key to every dimension row Combine all attributes into Master dimension table Product Surrogate Key Product KEY • Use the Master dimension Business Product Code Key to map the business Description key in the fact rows Brand Marketing Category to the surrogate key for Height Logistics each business process Width that uses the dimension Weight Cost Acctg. 21 Standard Cost

Terminology: Slowly Changing Dimension • Techniques for handling changes to dimension attributes • Type 1: overwrite attribute values • Common default, appropriate for corrections • Type 2 : create a new dimension row when attribute value changes • Flexible technique, critical for accurately tracking behavior over time • Type 3: track current attributes and most recent • Most useful if there’s been a big change, e. g. a reorg. • Hybrid combinations of 1 and 2 are most common 22

Slowly Changing Dimension Example • What if Cust 3 A 422 moved to CA on 12/1/2008? • Using Type 1: Order Facts Customer Dim Cust Key Cust_ Code Last_Name Zip 94025 1 3 A 422 Thornthwaite 48192 2 1 B 427 Jones 10019 Cust_Key Date Amount 1 12/23/2006 $200. 00 1 12/23/2007 $800. 00 2 8/9/2007 1 12/23/2008 $12. 00 $100. 00 r Report request for Sales by Zip on 12/24/08: Zip 23 Order Amount 94025 1, 100. 00 10019 12. 00

Slowly Changing Dimension Example – p. 2 • Using Type 2 (must have surrogate key): Customer Dim Cust Key Cust_Code Last_Name Zip Eff_Date End_Date Is_ Current Change Reason 1 3 A 422 Thornthwaite 48192 12/23/1998 11/30/2008 N NW 2 1 B 427 Jones 10019 8/9/2007 12/31/9999 Y NW 3 3 A 422 Thornthwaite 94025 12/1/2008 12/31/9999 Y ZP Report request on 12/24/08: Sales by Zip 24 Zip 48192 94025 10019 Order Amount 1, 000. 00 12. 00 Order Facts Cust_Key Date Amount 1 12/23/2006 $200. 00 1 12/23/2007 $800. 00 2 3 8/9/2007 $12. 00 12/23/2008 $100. 00

Additional Dimensional Modeling Concepts • Hierarchies Product KEY • Combining lower level dimension items into higher level groups • Many-to-one from lower to higher levels • Built into Analysis Services; additional attributes in relational design • Snowflaking • The partial re-normalization of dimensions • Generally discouraged, but useful in certain cases (e. g. Location) 25 • Reference dimension in AS Product Code Description Brand Category Height Width Weight Std Cost Product KEY Brand KEY Category KEY Product Code Description Brand Key Height Width Weight Std Cost Brand Code Brand Descr Category Key Category Code Category Descr

Additional Dimensional Modeling Concepts – p. 2 • Degenerate dimension • Usually a transaction system ID that has no related attributes (like Order Number) • Include it in the Fact table to tie transactions together and link back to source system. • A Fact dimension in AS • Junk dimensions • Combinations of miscellaneous, small dimensions into one single table • Order Info, Symptoms, Shipment Info Order Facts Cust_ Key 1 1 1 2 3 Prod Date Trans _Key _ID 56 12/23/98 327 43 12/23/98 327 92 12/23/01 412 43 8/9/04 587 104 12/23/04 602 Retail Transaction Info KEY Transaction Type Payment Type Amount $200. 00 $30. 00 $800. 00 $12. 00 $100. 00

Additional Dimensional Modeling Concepts – p. 3 • 27 Bridge tables (a many-to-many dimension relationship in AS) • Between fact and dimension • Between dimensions

Dimensional Modeling Process • Develop the Data Warehouse Bus matrix • Start with the 4 -step method to identify facts and dimensions • Step 1: Identify the business process (what row on the matrix should we start with? ) • Step 2: Declare the grain • Step 3: Choose the dimensions • Step 4: Choose the facts • Diagram the dimensional model • Fill in the dimension and fact attributes (Step 5) • Business requirements + source docs + data profiling • Follow naming standards (understandable to business) • Try the dimensional modeling spreadsheet from the book’s web site: http: //www. msft. DWToolkit. com 28

Example Bubble Chart Dimensional Model Date (Order, Due) Order Info Product Orders Customer Currency Promotion 29 Employee (Sales Person)

The Dimensional Modeling Spreadsheet (Available at: http: //www. kimballgroup. com /html/books. MDWTtools. html)

Data Profiling • Data exploration • Understand data structures, relationships and business rules • Identify (and document) data problems • Prove data feasibility • Options • Simple: SQL, BI tool, Report project (see website) • Advanced: Data Profiling tool • Ongoing process – will do more data profiling during ETL 31

Simple Data Profiling Example Table level info for the Product dimension

Column details Model. Name

Dimensional Modeling Myths and Misconceptions • Dimensional means summary • Dimensional models are built to support specific applications (or departments) • The dimensional model is less flexible than a third normal form model in DW/BI systems • The dimensional approach is not Enterprise oriented 34

Dimensional Modeling Summary • Enterprise perspective / roadmap • Enterprise Data Warehouse Bus Matrix • Presentation area must be dimensional • Ease of use • Query performance • Start with atomic detail, not just summary • Conform dimensions for consistency • Apply SCD techniques for handling attribute changes • Process • 4 -step approach • Business process, grain, dimensions, facts • Fill in the attributes and measures (Step 5) 35

Exercise: The Session Attendance Business Process Dimensional Model • Create a dimensional model to capture data from the business event in which we are all participating right now. • Think about business requirements • Follow the four-step process

Exercise: Here are some business requirements to think about • Conference evaluation and planning • “Improve conference by featuring more good tracks and sessions” = Attendance by track and session type • Speaker evaluation and planning • “Improve quality of speakers” = Most popular sessions by Type • Conference logistics • Offer the best conference experience at least cost = Distinct attendance per day and Morning vs Afternoon

Exercise: The Four-Step Process Business Value: 1. Business Process: 2. Grain: 3. Dimensions: 4. Facts:

Additional Workspace

An Example Session Attendance Dimensional Model Conference Audit (Update Audit) Date Audit (Insert Audit) Attendee Attendance Facts Room Session Primary Speaker Time_Slot Speaker_ Group_ Bridge Speaker

THE DIMENSIONAL MODEL IN THE RDBMS

Creating the Relational Dimensional Model • The modeling spreadsheet is a reasonable starting point • Run the DDL to build the database • [Create the ETL system to populate it] • Run some queries

Relational Model Diagram

Querying the Relational Dimensional Model • Conference evaluation and planning • “Improve conference by featuring more good tracks and sessions” = Attendance by track and session type • Speaker evaluation and planning • “Improve quality of speakers” = Most popular sessions by Type • Conference logistics • Offer the best conference experience at least cost = Distinct attendance per day and Morning vs Afternoon

THE DIMENSIONAL MODEL IN ANALYSIS SERVICES

Transferring the Model from Relational to Analysis Services • Use the Dimension Wizard and bring over one dimension at a time • Edit the names and properties of each dimension and its attributes as needed. Add business intelligence where appropriate. • Create hierarchies and attribute relationships for the hierarchies Remove redundant attribute relationships from the key field. • If necessary, define dimension translations. • Build, deploy, and process the dimension so you can look at the dimension’s data. Iterate. • Use the cube wizard to build the cube.

AS Data Source View

Attendance Cube

Querying the Relational Dimensional Model • Conference evaluation and planning • “Improve conference by featuring more good tracks and sessions” = Attendance by track and session type • Speaker evaluation and planning • “Improve quality of speakers” = Most popular sessions by Type • Conference logistics • Offer the best conference experience at least cost = Distinct attendance per day and Morning vs Afternoon

Conclusion • Dimensional model offers greater flexibility, usability, and performance for analytic queries • Surrogate keys and slowly changing dimensions are mandatory for analytics over time • Dimensional model works in both the Relational DB and Analysis Services platforms • Goal: Build AS from relational at atomic level

Contact Info • warren@kimballgroup. com • Visit www. kimballgroup. com for • • Articles Design tips (117 and counting) Whitepapers Forum • All of the concepts discussed are expanded on in the Kimball Toolkit series of books

Complete the Evaluation Form & Win! • You could win a Dell Mini Netbook – every day – just for handing in your completed form! Each session form is another chance to win! Pick up your Evaluation Form: • Within each presentation room • At the PASS Booth near registration area Drop off your completed Form: • Near the exit of each presentation room • At the PASS Booth near registration area Sponsored by Dell

Thank you for attending this session and the 2009 PASS Summit in Seattle
- Slides: 53