Data Warehousing COT 4810 432008 Data Warehousing o

  • Slides: 21
Download presentation
Data Warehousing COT 4810 4/3/2008

Data Warehousing COT 4810 4/3/2008

Data Warehousing o Overview n n o Methods n n o Purpose History Design

Data Warehousing o Overview n n o Methods n n o Purpose History Design Tools Other aspects of DW

The Purpose of Data Warehousing o o Atomic data has a value now Sometimes

The Purpose of Data Warehousing o o Atomic data has a value now Sometimes the value needs to be realized through analytics n n Reporting Data mining

The Purpose of Data Warehousing o o o Warehouses enable Decision Support Turn static

The Purpose of Data Warehousing o o o Warehouses enable Decision Support Turn static atomic data into valuable information Used to support a variety of other Business Intelligence (BI) platforms

The History of Data Warehousing o o Concept developed by IBM Researchers in 1980’s

The History of Data Warehousing o o Concept developed by IBM Researchers in 1980’s Really takes off in early-mid 1990 s n n 1991, Bill Inmon 1996, Ralph Kimball

What is a Data Warehouse? o Bill Inmon’s definition: “A data warehouse is a

What is a Data Warehouse? o Bill Inmon’s definition: “A data warehouse is a subject oriented, integrated, non-volatile, time variant collection of data [to support management decisions]. ”

What is a Data Warehouse? o Subject oriented n o Data is arranged by

What is a Data Warehouse? o Subject oriented n o Data is arranged by subject area instead of application Integrated n Data is collected and consistently stored from multiple sources

What is a Data Warehouse? o Non-volatile n Data is static o o It

What is a Data Warehouse? o Non-volatile n Data is static o o It doesn’t matter when the question is asked Time variant n Allows for analysis of data over time

OLTP and OLAP o Online Transaction Processing (OLTP) o Online Analytical Processing (OLAP)

OLTP and OLAP o Online Transaction Processing (OLTP) o Online Analytical Processing (OLAP)

Differences OLTP OLAP Designed for real-time business operations Designed for analysis of business measures

Differences OLTP OLAP Designed for real-time business operations Designed for analysis of business measures by category/attribute Processes a common set of transactions Optimized for bulk loads and large complex (sometimes unpredictable) queries Optimized to validate inserts/updates Assumes valid input (for the most part) Tuned for lots of concurrent users and connections Supports fewer users typically

OLTP and OLAP o They work well together… n OLAP supports offloaded OLTP data

OLTP and OLAP o They work well together… n OLAP supports offloaded OLTP data o n n No longer needs to accumulate in the application database (or on stored media) Less data, less index maintenance Allows OLTP to perform at peak transaction efficiency o Smaller OLTP database = better performance

What does it look like?

What does it look like?

Data Marts o A Data Mart hosts a specific subset of data o Multiple

Data Marts o A Data Mart hosts a specific subset of data o Multiple Data Marts can compose what BI/DSS views as the Data Warehouse

Inside the Warehouse o Data is captured from source from via an ETL process

Inside the Warehouse o Data is captured from source from via an ETL process 1. 2. 3. o Extract source data Transform source data Load into Warehouse Typically data is organized into Fact and Dimension tables

Fact Tables o Numerical data representing a specific activity n Details about the sale

Fact Tables o Numerical data representing a specific activity n Details about the sale of a product o o o How many How much When What store Fact tables exist at the center of the Star Schema

Dimension Tables o Characteristics of the facts that offer a business perspective n Details

Dimension Tables o Characteristics of the facts that offer a business perspective n Details about when a product was sold o o o Was it a weekday or weekend Was it a holiday First of the month, last of the month

Star Schema (ROLAP) http: //msdn 2. microsoft. com/en-us/library/aa 902672(SQL. 80). aspx

Star Schema (ROLAP) http: //msdn 2. microsoft. com/en-us/library/aa 902672(SQL. 80). aspx

Cubes (MOLAP) o Takes multiple dimensions and optimizes a view http: //www. selectorweb. com/images/olap_cube.

Cubes (MOLAP) o Takes multiple dimensions and optimizes a view http: //www. selectorweb. com/images/olap_cube. gif

Other Aspects of the Data Warehouse o o Data Mining User Interface design Requirements

Other Aspects of the Data Warehouse o o Data Mining User Interface design Requirements for Business Intelligence predictions or forecasts Data “Webhouse”

Questions? o o Why did Data Warehousing get a boost in the 1990 s?

Questions? o o Why did Data Warehousing get a boost in the 1990 s? What are the steps in the ETL process?

References o o o Kimball, Ralph. The Data Warehouse Toolkit. Wiley Publishing. 2 nd

References o o o Kimball, Ralph. The Data Warehouse Toolkit. Wiley Publishing. 2 nd Edition. April, 2002. Inmon, W. H. Building the Data Warehouse. Wiley Publishing. 4 th Edition. October, 2005. Dave Browning and J. Mundy. “Data Warehouse Design Considerations. ” Microsoft SQL Server Technical Articles. December, 2001.