Star Schema Snow flake schema Fact constellation schema

  • Slides: 14
Download presentation
Star Schema

Star Schema

Snow flake schema

Snow flake schema

Fact constellation schema

Fact constellation schema

What is a Fact Table? • Fact tables contain the data corresponding to a

What is a Fact Table? • Fact tables contain the data corresponding to a particular business process. • Each row represents a single event associated with that process and contains the measurement data associated with that event. • The information contained within a fact table is typically numeric data.

What are Dimensions? • Dimensions describe the objects involved in a business intelligence effort.

What are Dimensions? • Dimensions describe the objects involved in a business intelligence effort. While facts correspond to events, dimensions correspond to people, items, or other objects. • Dimension tables contain details about each instance of an object.

Fact table Vs dimension table • Fact tables and dimension tables are related to

Fact table Vs dimension table • Fact tables and dimension tables are related to each other. • If we take example of retail model, the fact table for a customer transaction would likely contain a foreign key reference to the item dimension table, where the entry corresponds to a primary key in that table for a record describing the item purchased.

Dimensional Modeling - Grain • Grain: The grain describe the level of detail. •

Dimensional Modeling - Grain • Grain: The grain describe the level of detail. • It can be also seen as: • the unique key of a table of a SQL Statement of the unique identifier of a hierarchy level in a dimension. At the top level, there are two main options in choosing the level of granularity: • Unsummarized/Atomic (transaction level granularity): this is the highest level of granularity where each fact table row corresponds to a single transaction or line item • Summarized: transactions may be summarized by a subset of dimensions or dimensional attributes. In this case, each row in the fact table corresponds to multiple transactions

Dimensional Modeling - Grain • The most granular or atomic data (atomic as an

Dimensional Modeling - Grain • The most granular or atomic data (atomic as an indivisible unit of work) has the most dimensionality. Atomic data is highly dimensional. Preferably, you should develop dimensional models for the most atomic information captured by a Event. Atomic data is the most detailled information collected: such data cannot be subdivided further. Example: If a high grain is the month whereas a low or detail grain can be the day • A data warehouse almost always demands data to the lowest possible grain of each dimension not because queries want to see individual low level rows but because queries need to cut through the details in very precise ways. • The lower the level of granularity (or conversely, the higher the level of summarization), the less storage space required and the faster queries will be executed.

Additive, Semi-Additive, and Non. Additive Facts • The numeric measures in a fact table

Additive, Semi-Additive, and Non. Additive Facts • The numeric measures in a fact table fall into three categories. The most flexible and useful facts are fully additive; additive measures can be summed across any of the dimensions associated with the fact table. Semiadditive measures can be summed across some dimensions, but not all; balance amounts are common semi -additive facts because they are additive across all dimensions except time. Finally, some measures are completely non-additive, such as ratios. A good approach for non-additive facts is, where possible, to store the fully additive components of the non-additive measure and sum these components into the final answer set before calculating the final non-additive fact. This final calculation is often done in the BI layer or OLAP cube.

Slowly Changing Dimensions(SCD) • dimensions have been assumed to be independent of time. Unfortunately,

Slowly Changing Dimensions(SCD) • dimensions have been assumed to be independent of time. Unfortunately, this is not the case in the real world. • dimension table attributes are relatively static, they are not fixed forever. • Dimension attributes change, though rather slowly, over time. Dimensional designers must engage business representatives proactively to help determine the appropriate change-handling strategy.

Clickstream Data • Clickstream data is an information trail a user leaves behind while

Clickstream Data • Clickstream data is an information trail a user leaves behind while visiting a website. • It is typically captured in semi-structured website log files. • These website log files contain data elements such as a date and time stamp, the visitor's IP address, the destination URLs of the pages visited, and a user ID that uniquely identifies the website visitor.

Clickstream Data logs

Clickstream Data logs

Multivalued Dimensions • An open-ended many-valued attribute can be associated with a dimension row

Multivalued Dimensions • An open-ended many-valued attribute can be associated with a dimension row by using a bridge table to associate the many-valued attributes with the dimension. Example: In some financial services companies, the individual customer is identified and associated with each transaction. For example, credit card companies often issue unique card numbers to each cardholder. John and Mary Smith may have a joint credit card account, but the numbers on their respective pieces of plastic are unique. In this case there is no need for an account-to-customer bridge table because the atomic transaction facts are at the discrete customer grain. Account and customer would both be foreign keys in this fact table.

Dimension attributes. • Dimension tables are integral companions to a fact table. The dimension

Dimension attributes. • Dimension tables are integral companions to a fact table. The dimension tables contain the textual descriptors of the business. • In a well-designed dimensional model, dimension tables have many columns or attributes. These attributes describe the rows in the dimension table. • Dimension attributes serve as the primary source of query constraints, groupings, and report labels. In a query or report request, attributes are identified as the by words. • For example, when a user states that he or she wants to see dollar sales by week by brand, week and brand must be available as dimension attributes. • Dimension table attributes play a vital role in the data warehouse. Since they are the source of virtually all interesting constraints and report labels, they are key to making the data warehouse usable and understandable. • The data warehouse is only as good as the dimension attributes. The power of the data warehouse is directly proportional to the quality and depth of the dimension attributes.