Data Warehousing Data Warehousing Make the right decisions

  • Slides: 51
Download presentation
Data Warehousing

Data Warehousing

Data Warehousing • Make the right decisions for your organization – Rapid access to

Data Warehousing • Make the right decisions for your organization – Rapid access to all kinds of information – Research and analyze the past data – Identify and predict future trends • The construction of data warehouse – Involve data cleaning and data integration – Provide on-line analytical processing (OLAP) tools for the interactive analysis of data • W. H. Inmon – A data warehouse is a subject oriented, integrated, timedependent and non-volatile collection of data in support of management’s decision making process

Characteristics of Data Warehouse • Subject-oriented – Data warehouse is designed for decision support

Characteristics of Data Warehouse • Subject-oriented – Data warehouse is designed for decision support and around major subject, such as customer and sales – Not all information in the operational database is useful • Integrated – Integrate multiple heterogeneous sources and make it consistent – The data from different sources may use different names for the same entities

Characteristics of Data Warehouse • Time dependent – Record the information and the time

Characteristics of Data Warehouse • Time dependent – Record the information and the time when it was entered – Data mining can be done from the data in some period of time • Non-volatile – Data in a data warehouse is never updated

Data Warehousing • Data warehousing – The process of constructing and using data warehouse

Data Warehousing • Data warehousing – The process of constructing and using data warehouse • Two types of databases – Operational database • Large database in operation • Built for high speed and large number of users – Data warehouse • Designed for decision support • Contain vast amounts of historical data • Data mart – A department subset of the data warehouse that focuses on selected subjects, and its scope is department-wide

OLTP & OLAP System • OLTP (On-Line Transaction Processing) System – The major task

OLTP & OLAP System • OLTP (On-Line Transaction Processing) System – The major task of operational database is to perform on-line transaction and query processing • OLAP (On-Line Analytical Processing) System – Data warehouse system serves users on data analysis and decision making

Differences ~ OLTP & OLAP • Characteristic – OLTP: operational processing – OLAP: informational

Differences ~ OLTP & OLAP • Characteristic – OLTP: operational processing – OLAP: informational processing • Orientation – OLTP: transaction-oriented – OLAP: analysis-oriented • User – OLTP: customer, DBA – OLAP: manager, analyst • Function – OLTP: day-to-day operations – OLAP: information requirement, decision support

Differences ~ OLTP & OLAP • DB design – OLTP: ER based, application-oriented –

Differences ~ OLTP & OLAP • DB design – OLTP: ER based, application-oriented – OLAP: star/snowflake, subject-oriented • Data – OLTP: current; guaranteed up-to date – OLAP: historical • Unit of work – OLTP: short, simple query – OLAP: complex query • Access – OLTP: read/write – OLAP: mostly read • DB size – OLTP: 100 MB to GM – OLAP: 100 GB to TB

Differences ~ OLTP & OLAP

Differences ~ OLTP & OLAP

Data Warehousing Relational Data Model Relational Schema Multidimensional Data Model Star Schema or Snowflake

Data Warehousing Relational Data Model Relational Schema Multidimensional Data Model Star Schema or Snowflake Schema

Model & Schema for Relational Database Relational Data Model Relational Schema

Model & Schema for Relational Database Relational Data Model Relational Schema

Multidimensional Data Model • Example: All. Electronics creates a sales data warehouse in order

Multidimensional Data Model • Example: All. Electronics creates a sales data warehouse in order to keep records of the store’s sales – Fact Table • sales amount in dollars and number of units sold (measure) – Dimension Tables • time, item, branch, and location • Multidimensional data model views data in the form of a data cube

Two Dimensions • 2 -D view of sales data for item sold per quarter

Two Dimensions • 2 -D view of sales data for item sold per quarter in the city of Vancouver. The measure is dollars_sold (in thousands) Dimensions Measures

Three Dimensions • 3 -D view of sales data according to the dimensions time,

Three Dimensions • 3 -D view of sales data according to the dimensions time, item and location. The measure is dollars_sold (in thousands) Dimensions Measures

Three Dimensions • 3 -D data cube representation according to the dimensions time, item

Three Dimensions • 3 -D data cube representation according to the dimensions time, item and location. The measure is dollars_sold (in thousands)

Four Dimensions • 4 -D data cube of sales data according to the dimensions

Four Dimensions • 4 -D data cube of sales data according to the dimensions supplier, time, item and location. The measure is dollars_sold (in thousands)

Schemas for Multidimensional Data Model • Star Schema • Snowflake Schema • Fact Constellation

Schemas for Multidimensional Data Model • Star Schema • Snowflake Schema • Fact Constellation Schema

Star Schema

Star Schema

Snowflake Schema

Snowflake Schema

Snowflake Schema • Some dimension tables are normalized to reduce redundancies and save storage

Snowflake Schema • Some dimension tables are normalized to reduce redundancies and save storage space • Reduce the effectiveness of browsing since more join will be needed to execute a query • This saving of space is negligible in comparison to the magnitude of the fact table • Snowflake schema is not as popular as the start schema in data warehouse design

Fact Constellation Schema • Multiple fact tables share dimension tables

Fact Constellation Schema • Multiple fact tables share dimension tables

OLAP Technologies

OLAP Technologies

Concept Hierarchies

Concept Hierarchies

Three-Tier DW Architecture

Three-Tier DW Architecture

Case Study in Data Warehousing

Case Study in Data Warehousing

Design of Data Warehouse • How can I design a data warehouse ? –

Design of Data Warehouse • How can I design a data warehouse ? – Top-down approach – Bottom-up approach – Combination of both • In general, the warehouse design process consists of the following steps – – Choose a business process to model Choose the gain of the business process Choose the dimensions Choose the measures