Data Warehousing Data Warehousing Make the right decisions



















































- Slides: 51
Data Warehousing
Data Warehousing • Make the right decisions for your organization – Rapid access to all kinds of information – Research and analyze the past data – Identify and predict future trends • The construction of data warehouse – Involve data cleaning and data integration – Provide on-line analytical processing (OLAP) tools for the interactive analysis of data • W. H. Inmon – A data warehouse is a subject oriented, integrated, timedependent and non-volatile collection of data in support of management’s decision making process
Characteristics of Data Warehouse • Subject-oriented – Data warehouse is designed for decision support and around major subject, such as customer and sales – Not all information in the operational database is useful • Integrated – Integrate multiple heterogeneous sources and make it consistent – The data from different sources may use different names for the same entities
Characteristics of Data Warehouse • Time dependent – Record the information and the time when it was entered – Data mining can be done from the data in some period of time • Non-volatile – Data in a data warehouse is never updated
Data Warehousing • Data warehousing – The process of constructing and using data warehouse • Two types of databases – Operational database • Large database in operation • Built for high speed and large number of users – Data warehouse • Designed for decision support • Contain vast amounts of historical data • Data mart – A department subset of the data warehouse that focuses on selected subjects, and its scope is department-wide
OLTP & OLAP System • OLTP (On-Line Transaction Processing) System – The major task of operational database is to perform on-line transaction and query processing • OLAP (On-Line Analytical Processing) System – Data warehouse system serves users on data analysis and decision making
Differences ~ OLTP & OLAP • Characteristic – OLTP: operational processing – OLAP: informational processing • Orientation – OLTP: transaction-oriented – OLAP: analysis-oriented • User – OLTP: customer, DBA – OLAP: manager, analyst • Function – OLTP: day-to-day operations – OLAP: information requirement, decision support
Differences ~ OLTP & OLAP • DB design – OLTP: ER based, application-oriented – OLAP: star/snowflake, subject-oriented • Data – OLTP: current; guaranteed up-to date – OLAP: historical • Unit of work – OLTP: short, simple query – OLAP: complex query • Access – OLTP: read/write – OLAP: mostly read • DB size – OLTP: 100 MB to GM – OLAP: 100 GB to TB
Differences ~ OLTP & OLAP
Data Warehousing Relational Data Model Relational Schema Multidimensional Data Model Star Schema or Snowflake Schema
Model & Schema for Relational Database Relational Data Model Relational Schema
Multidimensional Data Model • Example: All. Electronics creates a sales data warehouse in order to keep records of the store’s sales – Fact Table • sales amount in dollars and number of units sold (measure) – Dimension Tables • time, item, branch, and location • Multidimensional data model views data in the form of a data cube
Two Dimensions • 2 -D view of sales data for item sold per quarter in the city of Vancouver. The measure is dollars_sold (in thousands) Dimensions Measures
Three Dimensions • 3 -D view of sales data according to the dimensions time, item and location. The measure is dollars_sold (in thousands) Dimensions Measures
Three Dimensions • 3 -D data cube representation according to the dimensions time, item and location. The measure is dollars_sold (in thousands)
Four Dimensions • 4 -D data cube of sales data according to the dimensions supplier, time, item and location. The measure is dollars_sold (in thousands)
Schemas for Multidimensional Data Model • Star Schema • Snowflake Schema • Fact Constellation Schema
Star Schema
Snowflake Schema
Snowflake Schema • Some dimension tables are normalized to reduce redundancies and save storage space • Reduce the effectiveness of browsing since more join will be needed to execute a query • This saving of space is negligible in comparison to the magnitude of the fact table • Snowflake schema is not as popular as the start schema in data warehouse design
Fact Constellation Schema • Multiple fact tables share dimension tables
OLAP Technologies
Concept Hierarchies
Three-Tier DW Architecture
Case Study in Data Warehousing
Design of Data Warehouse • How can I design a data warehouse ? – Top-down approach – Bottom-up approach – Combination of both • In general, the warehouse design process consists of the following steps – – Choose a business process to model Choose the gain of the business process Choose the dimensions Choose the measures