Dimensional Modelling 102 Author Presented To Revised Date

  • Slides: 54
Download presentation
Dimensional Modelling 102 Author Presented To Revised Date Nick Draper, John Campbell Bearing. Point

Dimensional Modelling 102 Author Presented To Revised Date Nick Draper, John Campbell Bearing. Point Internal September 2008 Global Management and Technology Consulting

Authors This presentation was prepared by: John Campbell, Technical Architect Bearing. Point 120 Collins

Authors This presentation was prepared by: John Campbell, Technical Architect Bearing. Point 120 Collins Street Melbourne Australia Tel: +61 3 8665 9537 E-mail: john. campbell@bearingpoint. com Nicholas Draper, Consultant Bearing. Point 120 Collins Street Melbourne Australia Tel: +61 3 8665 9694 E-mail: nick. draper@bearingpoint. com This document is protected under the copyright laws of the United States and other countries. This document contains information that is proprietary and confidential to Bearing. Point, Inc. , its subsidiaries, or its alliance partners, which shall not be disclosed outside or duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate Bearing. Point, Inc. Any use or disclosure in whole or in part of this information without the express written permission of Bearing. Point, Inc. is prohibited. © 2005 Bearing. Point, Inc. All rights reserved. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

DIMENSIONAL MODELLING 102 - CONTEXT Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

DIMENSIONAL MODELLING 102 - CONTEXT Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Information Management Course Context Information Management Methodology (MIKE 2) CUSTOMER NUMBER CUSTOMER NAME CUSTOMER

Information Management Course Context Information Management Methodology (MIKE 2) CUSTOMER NUMBER CUSTOMER NAME CUSTOMER CITY CUSTOMER ST CUSTOMER POST CUSTOMER ADDR CUSTOMER PHONE CUSTOMER FAX Data Investigation Network Data Standardisation Process Automation Data Correction ETL Data Enrichment Data Matching Middleware/ESB Relational Data Modeling Metadata Management Operations Security Platforms ‘Why’ Information Management Solution Packs (see im. Collab) Data Warehousing Reference and Master Data Warehousing DW 101 MIKE 2 Information Governance SQL 101 ETL 101 Data Quality Information Governance Dimensional Modelling DM 102 Solution Focused Internal Meta Data Modelling DM 101 Technique Focused – ‘What’ ERwin DM 202 © 2008 Bearing. Point, Inc. Meta. Data 101 Tool Focused – ‘How’ INFORMATION MANAGEMENT

DW Reference Architecture Source Systems 3 Changes SS 2 Manual Data Entry Source System

DW Reference Architecture Source Systems 3 Changes SS 2 Manual Data Entry Source System Metadata ODS 4 6 DDS Extract, Transform and Load SS 1 7 5 Extract, Transform and Load 10 Value Add 2 Extract, Cleanse, Transform & Load SS 2 1 Replicate/Copy SS 1 Source System Replicas 9 8 Data Mart Manual Data Source System ETL Metadata ODS Metadata ETL Metadata Data Warehouse Metadata Web Infrastructure ETL Data Mart Metadata Web Client Desktop Client Access Metadata Repository Focus of this course Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Why are we presenting this course ? This course is being presented to Bearing.

Why are we presenting this course ? This course is being presented to Bearing. Point’s TS technical personnel n To establish a base line as regards nomenclature and knowledge n To set expectations for how Data Models should be presented in client documents. n (In short – so we can all read and understand each others work. ) n To establish a basic Dimensional Modelling Paradigm n To supply practical skills Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Contents Recap Models and Model Types The Basics Schema Types ─ Star schema ─

Contents Recap Models and Model Types The Basics Schema Types ─ Star schema ─ Snowflake schema Components of a Dimensional Model ─ Facts ─ Dimensions ─ Relationships ─ Grain Technique Thomsen Diagram The Process Slightly Advanced Internal • Date and Time, Time Variant • Surrogate keys, Slowly Changing Dimensions • Aggregate fact tables © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

MODELS AND MODEL TYPES Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

MODELS AND MODEL TYPES Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

What is a Model ? Definitions of ‘Model’ abound n “the act of representing

What is a Model ? Definitions of ‘Model’ abound n “the act of representing something (usually on a smaller scale)” Properties n They aren’t real n Their function is to aid communication, between users, technologists, machines “All Models are wrong, some models are useful. ” Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Are there different types of ‘Data Models’ ? Several levels of ‘Data Models’ are

Are there different types of ‘Data Models’ ? Several levels of ‘Data Models’ are usually used and each has, as a focus, a different audience These were covered in the Data Modelling course. All three types are applicable Dimensional modelling also, however, Logical and Physical are more prominent. Logical Model Conceptual Model Physical Model Business Users Internal Technicians © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

THE BASICS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

THE BASICS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Dimensional Modelling Vs Normalisation n Normalisation is good for the middle layer of a

Dimensional Modelling Vs Normalisation n Normalisation is good for the middle layer of a 3 -tier DW design l Minimal redundancy improves maintainability – data is updated in one place. l Normalised form can unify a diversity of enterprise data sources in a flexible manner. n Denormalisation is good for Business Intelligence l Minimal redundancy is not necessary because data is derived from other sources, not directly maintained in dimensional form. l Redundancy improves comprehension and usability of data structures. l Data mart SQL tends to consist of complex queries affecting a large number of tables and columns and returning large result sets. A simple structure can improve query performance. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Spreadsheet – Two Dimensions Lets us start with a typical example of two-dimensional data.

Spreadsheet – Two Dimensions Lets us start with a typical example of two-dimensional data. Anything that you track, whether it is hours per employee, costs per department, balance per customer, or complaints per store, can be arrange in a twodimensional format. The data set may be said to be arranged to have two dimensions: a row-arranged month dimension and a column-arranged measures. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Pivot Table - Three Dimensions Now, let’s add a THIRD Contextual Dimension to the

Pivot Table - Three Dimensions Now, let’s add a THIRD Contextual Dimension to the same spreadsheet – Products. The spreadsheet now highlights that the Date and Measures data presented relates specifically to the Product Category – Shoes. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

COGNOS 8 – Analysis Studio This kind of display can be found in many

COGNOS 8 – Analysis Studio This kind of display can be found in many current OLAP tools. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

SCHEMA TYPES Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

SCHEMA TYPES Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Dimensional Modelling Based around ‘Measures’ (Fact Tables) that are constrained by ‘Dimensions’ (Dimension Tables).

Dimensional Modelling Based around ‘Measures’ (Fact Tables) that are constrained by ‘Dimensions’ (Dimension Tables). n Very common in Data Warehouse applications. n Can directly feed other tools such as MOLAP databases. . n Tend to have a very specific focus that is easy for users to understand n Users will get confused with more than about six dimensions. n The model is easy - the ‘Fact’ is unique by ALL of the ‘Dimensions’. n Is concerned primarily with retrieval needs. n Is almost always summarised, eg using SUM or MAX and so on. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Star Schema n Internal Always start with this simple form Basic form includes a

Star Schema n Internal Always start with this simple form Basic form includes a central table with a number of descriptive tables joined directly • Central table known as the Fact table • Satellite tables known as Dimension tables • A simpler design that can be easily optimized for data retrieval n All dimension tables completely denormalised n All dimension tables relate directly to the fact table n The Grain is set by the dimensions n Best for straightforward modelling requirements. Fact Table Dimension Tables © 2008 Bearing. Point, Inc. Relationships INFORMATION MANAGEMENT

Snow Flake Schema n Described as a variant option n One or more dimension

Snow Flake Schema n Described as a variant option n One or more dimension tables are not completely denormalised n Some data represented in a snowflake or outrigger table n Benefits ─ Useful for complex modelling situations, such as dynamic hierarchies, or shared dimensions n Drawbacks ─ More navigation needed and it can get complicated quickly n Internal Avoid unless requirements demand it © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

COMPONENTS OF A DIMENSIONAL MODEL Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

COMPONENTS OF A DIMENSIONAL MODEL Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model ─ Fact Tables ─ Dimensional Tables Fact Table ─

Components of a Dimensional Model ─ Fact Tables ─ Dimensional Tables Fact Table ─ Relationships ─ Grain Dimension Tables Relationships Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model - FACT Tables n Contain measures ─ Usually numeric,

Components of a Dimensional Model - FACT Tables n Contain measures ─ Usually numeric, measures quantify the business n Most useful measures are additive ─ Additive measures can be meaningfully added across rows n Row population is sparse ─ A row exists only where there are non-zero measures n Fact tables are not denormalised n Examples: ─ Sales ─ Counts ─ Percentage Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model DIMENSION Tables n Purpose of a Dimension – To

Components of a Dimensional Model DIMENSION Tables n Purpose of a Dimension – To Add Context to the Fact n Contain attributes ─ Usually textual, attributes describe the business n Attributes are used for filtering and grouping n Dimension tables are typically denormalised ─ Increases comprehension ─ Facilitates browsing n Have unique primary keys to identify every row n Example Dimensions ─ Date ─ Product Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model RELATIONSHIPS n Relationships are one-to-many ─ Dimensions are parents

Components of a Dimensional Model RELATIONSHIPS n Relationships are one-to-many ─ Dimensions are parents (one) ─ Facts are children (many) ─ Any logical many-to-many relationships must be decomposed n Fact tables contain foreign keys ─ Point to primary keys in dimension tables n Referential Integrity is critical ─ Every fact must have a parent row in each dimension table ─ Violations lead to incorrect and inconsistent query results ─ If you have to put a ‘Not Applicable’ value in the dimension so that the Fact has something to link to. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model - GRAIN n ‘Grain’ is the fundamental atomic level

Components of a Dimensional Model - GRAIN n ‘Grain’ is the fundamental atomic level of data to be represented in the fact table. n Business analysis discovers the level at which the data needs to be represented n Fact grain is determined ─ Transaction is the finest grain ─ Data is aggregated if transaction grain is not needed n Dimension grain is matched to fact grain n Fine grains have performance implications – hardware must be adequate to handle the load Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model - RECAP Each Fact, (A ‘Claim Transaction. Amount’) is

Components of a Dimensional Model - RECAP Each Fact, (A ‘Claim Transaction. Amount’) is defined by ONE Customer, ONE Claim Transaction Type, ONE Catastrophe, ONE Calendar (date), ONE Product, ONE Coverage. Other than the Claim Transaction all entities are Dimensions’ Grain here is at the transaction level – each ‘claim’. Without this we would have to, say, total the amounts across all claims for the one day. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Components of a Dimensional Model - RECAP With this model we can ‘drill down’

Components of a Dimensional Model - RECAP With this model we can ‘drill down’ to the most granular fact level (one customer, one claim etc OR by discounting one or more dimensions we can aggregate up to, say, a whole day or across all claims of a certain type Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

TECHNIQUE: THOMSEN DIAGRAM Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

TECHNIQUE: THOMSEN DIAGRAM Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Technique: Thomsen Diagrams n Purpose – Rich Documentation /Design Technique ─ particularly good beyond

Technique: Thomsen Diagrams n Purpose – Rich Documentation /Design Technique ─ particularly good beyond 3 dimensions where the cube analogy falls apart n Represents ‘Dimensions’ as line segment with Hierarchy or aggregation levels specifically noted n Measures / Facts are listed to the side Total Products Total All Years Product category Total Year Product Type Total Month Product Day (Date) Measures (Facts) * Claim Transaction Amount n Strengths ─ Scalability, Simplicity ─ Hierarchies, and their order, immediately visible and verifiable Dimension Fact n Weaknesses ─ 10+ Dimensions Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

PROCESS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

PROCESS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Inputs to Dimensional Modelling n Requirements documents n Source data models / database schemas

Inputs to Dimensional Modelling n Requirements documents n Source data models / database schemas n Metadata / data dictionary n Existing reports / analysis models n Consultation with business users n Consultation with technical users Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Outputs of Dimensional Modelling n Logical data model n Physical data model n Source-target

Outputs of Dimensional Modelling n Logical data model n Physical data model n Source-target mapping n Validated business rules / transformations Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Dimensional Modelling Steps 1. Understanding the Business Problem 2. Choose the Dimensions 3. Choose

Dimensional Modelling Steps 1. Understanding the Business Problem 2. Choose the Dimensions 3. Choose the Grain of the Fact Table 4. Choose the Measured Facts 5. Choose the Dimension Attributes 6. Deriving the Physical Model from Logical Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Example We are going to explore the fictional ‘ACME Bolt company’ and

Modelling Steps Example We are going to explore the fictional ‘ACME Bolt company’ and it’s Key performance Indicator: “Total Bolts Sold per Customer (TBSC)” This measure has been handed down from ACME’s foreign parent and everybody’s bonus is related to proving this value has risen over the latest periods. 1. Understanding the Business Problem What is our problem ? Getting our bonus of course but for this exercise lets just satisfy the request. We will at least need to discover what is meant by ‘Customer’ and ‘Bolts sold’ (does this mean ordered? Invoiced? Delivered? ) Interviews with the clients reveals that its not across the board, individual regions and branches will be judged also. We have to prove a general trend across 12 months. The user would also like to know something of the Customer’s category also so as to be able to manage trends during the year. Similarly the groupings of bolts. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Example (Cont) 2. Choose the Dimensions n These are readily apparent if

Modelling Steps Example (Cont) 2. Choose the Dimensions n These are readily apparent if using the Thomsen Diagram. n If not then these will need to be deduced from the Problem definition in step one. n Draw a small data model with the measure / fact in the centre and the other concepts around this – including an entity for each aggregation level. n A Star schema would collapse (denormalise) all these levels. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Example (Cont) 3. Choose the Grain of the Fact Table 4. Choose

Modelling Steps Example (Cont) 3. Choose the Grain of the Fact Table 4. Choose the Measured Facts n These steps go hand in hand. In our example just one fact has been requested – Number (count) of bolts sold. In the real world this may also include ‘profit’ but anything included has to be constrained by the exact same dimensions. n The Grain may be determined by the request in this case total bolts per day per customer product per store n Sometimes extra detail is included eg choose ‘day’ even if only month has been asked for. • n Sometimes the detail requested cannot be stored – whilst we aim for Transaction level some clients have hundreds of millions of low level transactions which it is just not economical to replicate and manage. • Internal Extra detail is more flexible for the future but costs more today to load and summarise Losing detail sacrifices flexibility but can reduce cost. © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Example (Cont) 5. Choose the Dimension Attributes n Dimension attributes describe the

Modelling Steps Example (Cont) 5. Choose the Dimension Attributes n Dimension attributes describe the business. They are used to filter and group in reports and queries. n Choose to decode any codes eg don’t just take a ‘region code’ – decode it as well and take ‘Region Name’ – these will become user selectable items. n Character fields are almost always attributes. n Err on the side of including too many attributes from the source data – the performance penalty is negligible and it is simple to hide any attributes which are later found to be useless. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Exercise n Lets Build the Thomsen Diagram for our ACME KPI. Total

Modelling Steps Exercise n Lets Build the Thomsen Diagram for our ACME KPI. Total Products n Step 1 is analysis. This has been completed and is summarised below n Step 2 is choosing the dimensions – I have added one to start us off. Product category n Step 3 is choose the Grain of the Fact Table Product (Bolt) n Step 4 is choose the measured fact Problem Restatement: “Total Bolts Sold per Customer (TBSC)” Additional management info requested. Internal n Geographic detail of Store, Branch Region n Product and category n Customer and the customer category hierarchy © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Modelling Steps Exercise ─ Here is a finished version ─ As discussed it can

Modelling Steps Exercise ─ Here is a finished version ─ As discussed it can also confirm that Branch is within Region (not the other way around) Total All Areas Total All Customers Region Customer Category Branch Customer Sub Category Product category * Bolts sold Total Month Store Customer Product (Bolt) Day (Date) Total Products All Years Total Year Measures (Facts) Dimension Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Exercise Star Model From the Thomsen Diagram we can create a data model… n

Exercise Star Model From the Thomsen Diagram we can create a data model… n n Because this is a star schema (i. e. totally denormalised) you can see how we cannot tell whether Branch or Region is higher. Remember the purpose of the model is communication (p. s. I know it not the purest data model) Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

SLIGHTLY ADVANCED Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

SLIGHTLY ADVANCED Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Slightly Advanced Topics n Date and Time n Time Variant n Surrogate keys n

Slightly Advanced Topics n Date and Time n Time Variant n Surrogate keys n Hierarchies n Aggregate fact tables Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Date and Time n Don’t confuse the two! n You will meet many situations

Date and Time n Don’t confuse the two! n You will meet many situations where it is stated that a dimension is ‘time’. Almost universally this is actually date. n Date and Time are ‘static’ reference dimensions and should be populated in advance – usually as part of initial build. n The DAK Data standards document has a sample schema for Calendar which includes extra data columns for ‘is last day of month’ etc. These can make later queries much easier. Sample spreadsheets to load also exist. n Do not be temped to combine into one dimension ─ At the grain of Date there would be 3650 rows to represent 10 years. ─ At the grain of Minute there are 1440 minutes in a day – so 1440 rows needed ─ Combined this would need 5, 256, 000 rows to represent all the minutes for 10 years. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Time Variant n Time Variance: “A characteristic of a data warehouse that defines the

Time Variant n Time Variance: “A characteristic of a data warehouse that defines the moment in time that the data or variant of the data is valid. If Order No. 123 has a value of $1, 500. 00 on Dec 1 and $1, 700 on Dec 10, Dec 1 and Dec 10 shows us the time variance of Order No. 123. ” n Many operational source systems only record one item of information and if that changes the new value is simply replaced. E. g. if you move house your doctor or movie rental company really doesn’t care where you used to live. n For big organisations this is sometimes overcome by using a separate data warehouse where each change is noted by boundary dates. I. e. a start and end date. n Adding these is not simple as it can subtly changes the overall granularity – if dates (not times) are used then only one fact value is possible per day. You now have to be extremely clear about what value is to be used. The last of the day? The maximum of the day? Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Natural and Surrogate Keys n Natural keys are the values usually referred to by

Natural and Surrogate Keys n Natural keys are the values usually referred to by people as the identifiers of entities (customer number, claim number, etc. ). ─ They are often the primary keys in source systems. ─ As a general rule they should not be used in a warehouse - but they might be in a dimensional model directly off a single source system. ─ The problem is what happens when a second data source populates the dimension? Now the jumble of numbers means nothing and in fact the same identifier could be used in different systems to refer to different things. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Natural and Surrogate Keys n A surrogate is something used ‘instead of’. A surrogate

Natural and Surrogate Keys n A surrogate is something used ‘instead of’. A surrogate is an artificial, numeric, key generated from a pool of numbers inside the warehouse. ─ Use these as Primary Keys for Dimensions. ─ Will facilitate efficient Fact to Dimension joins ─ Support Slowly changing dimension (next slide) n If using surrogates then bring in the source system natural key along with another field to say which source system this value came from – i. e. put the context back. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Hierarchies n Hierarchies are pervasive in the vast majority of organisations. Total Products n

Hierarchies n Hierarchies are pervasive in the vast majority of organisations. Total Products n Hierarchies are quite disorganised in the vast majority of organisations. ─ Multiple independent hierarchies often needed n Hierarchies within the dimensions are very important Product category Product (Bolt) ─ Within the proper tool they enable “drill up/ drill down” ─ e. g. day, week, month, quarter, year ─ e. g. Product, Product Category, Total Products n Details usually need to be explicitly stored. ─ E. g. Decode all codes. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Hierarchies – Simple, Static Hierarchies Simple, static hierarchies are best designed directly into the

Hierarchies – Simple, Static Hierarchies Simple, static hierarchies are best designed directly into the dimensions. Total Products n This is what was meant when we said we ‘denormalised’ for a Star Schema n Easiest to use Product category n Most efficient to query n e. g. Product, Product Category, Total Products Product (Bolt) n Example: Geography (Store, Branch, Region, [State, Country]) Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Hierarchies – Complex And/Or Dynamic Hierarchies If hierarchies are complex, if there are multiple

Hierarchies – Complex And/Or Dynamic Hierarchies If hierarchies are complex, if there are multiple hierarchies on a dimension or if the hierarchy changes often, it could be messy to design the hierarchy into the dimension. n Snowflake the dimension, creating one or more outboard hierarchy tables. n Changes to hierarchies do not affect the base dimension. n Multiple hierarchies can be represented with multiple tables or with a hierarchy ID column which must be filtered on in any query. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Hierarchies – Example Below is a common situation, caused where different ‘departments’ view the

Hierarchies – Example Below is a common situation, caused where different ‘departments’ view the summarised data differently. n In this case it is Critical that it is the SAME fact with the same granularity – in this case Store. n It is just the summaries beyond ‘Store’ that differ. n Watch for the same item being used in different contexts e. g. Region in this example is NOT the same thing. Internal © 2008 Bearing. Point, Inc. Total All Warehouses Region Warehouse Distribution Node Total All Areas Region Measures (Facts) * Bolts sold Branch Store INFORMATION MANAGEMENT

Aggregate Fact Tables n Fact tables are very large n Aggregates (pre-stored summaries) are

Aggregate Fact Tables n Fact tables are very large n Aggregates (pre-stored summaries) are the most effective way of improving data warehouse performance n An aggregate is a fact table records representing a summarisation of base level fact table records. n Can be Explicitly Designed and managed or many DBMS now have inbuilt Aggregations available. n Aggregate awareness – DBMS implicit and OLAP tool explicit n Each grain of aggregate should occupy its own fact table, and be supported by appropriate category dimension tables ─ What will that do to the number of tables? Can be an exponential blow out. ─ Complexity from end-users point of view? They can be forced to remember what summaries exist and what they are called. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Revised Star Model This is our data model revised to use surrogate keys, some

Revised Star Model This is our data model revised to use surrogate keys, some time variance and one split hierarchy. Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

QUESTIONS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

QUESTIONS Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT

Internal © 2008 Bearing. Point, Inc. INFORMATION MANAGEMENT