Introduction to Data Warehousing Why Data Warehouse Scenario

  • Slides: 43
Download presentation
Introduction to Data Warehousing

Introduction to Data Warehousing

Why Data Warehouse?

Why Data Warehouse?

Scenario 1 ABC Pvt. Ltd is a company with branches at Mumbai, Delhi, Chennai

Scenario 1 ABC Pvt. Ltd is a company with branches at Mumbai, Delhi, Chennai and Bangalore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.

Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Sales per item type per branch

Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Sales per item type per branch for first quarter. Chennai Banglore Sales Manager

Solution 1: ABC Pvt Ltd. Extract sales information from each database. Store the information

Solution 1: ABC Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site.

Solution 1: ABC Pvt Ltd. Mumbai Report Delhi Data Warehouse Chennai Banglore Query &

Solution 1: ABC Pvt Ltd. Mumbai Report Delhi Data Warehouse Chennai Banglore Query & Analysis tools Sales Manager

Scenario 2 One Stop Shopping Super Market has huge operational database. Whenever Executives wants

Scenario 2 One Stop Shopping Super Market has huge operational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Scenario 2 : One Stop Shopping Data Entry Operator Report Wait Data Entry Operator

Scenario 2 : One Stop Shopping Data Entry Operator Report Wait Data Entry Operator Operational Database Management

Solution 2 Extract data needed for analysis from operational database. Store it in another

Solution 2 Extract data needed for analysis from operational database. Store it in another system, the data warehouse. Refresh warehouse at regular intervals so that it contains up to date information for analysis. Warehouse will contain data with historical perspective.

Solution 2 Data Entry Operator Report Transaction Data Entry Operator Operational database Extract data

Solution 2 Data Entry Operator Report Transaction Data Entry Operator Operational database Extract data Data Warehouse Manager

Scenario 3 Cakes & Cookies is a small, new company. The chairman of this

Scenario 3 Cakes & Cookies is a small, new company. The chairman of this company wants his company to grow. He needs information so that he can make correct decisions.

Solution 3 Improve the quality of data before loading it into the warehouse. Perform

Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries.

Solution 3 Expansio n sales Data Warehouse Query & Analysis tool Chairman time Improvemen

Solution 3 Expansio n sales Data Warehouse Query & Analysis tool Chairman time Improvemen t

Summing up? Why do you need a warehouse? Operational systems could not provide strategic

Summing up? Why do you need a warehouse? Operational systems could not provide strategic information Executive and managers need such strategic information for Making proper decision Formulating business strategies Establishing goals Setting objectives Monitoring results

Why operational data is not capable of producing valuable information? Data is spread across

Why operational data is not capable of producing valuable information? Data is spread across incompatible structures and systems Not only that, improvements in technology had made computing faster, cheaper and available

FAILURES OF PAST DECISIONSUPPORT SYSTEMS OLTP systems

FAILURES OF PAST DECISIONSUPPORT SYSTEMS OLTP systems

Decision support systems

Decision support systems

Operational and informational

Operational and informational

What is Data Warehouse? ? Is it the only viable solution

What is Data Warehouse? ? Is it the only viable solution

Business intelligence at DW

Business intelligence at DW

Functional definition of a DW The data warehouse is an informational environment that Provides

Functional definition of a DW The data warehouse is an informational environment that Provides an integrated and total view of the enterprise Makes the enterprise’s current and historical information easily available for decision making Makes decision-support transactions possible without hindering operational systems Renders the organization’s information consistent Presents a flexible and interactive source of strategic information

Questions? ? Describe five differences between operational systems and informational systems A data warehouse

Questions? ? Describe five differences between operational systems and informational systems A data warehouse in an environment, not a product. Discuss.

Building Blocks of a Datawarehouse

Building Blocks of a Datawarehouse

Inmons’s definition A data warehouse is - subject-oriented, - integrated, - time-variant, - nonvolatile

Inmons’s definition A data warehouse is - subject-oriented, - integrated, - time-variant, - nonvolatile collection of data in support of management’s decision making process.

Subject-oriented Data warehouse is organized around subjects such as sales, product, customer. It focuses

Subject-oriented Data warehouse is organized around subjects such as sales, product, customer. It focuses on modeling and analysis of data for decision makers. Excludes data not useful in decision support process.

Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied

Integration Data Warehouse is constructed by integrating multiple heterogeneous sources. Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Flat File Data Warehouse Data Processing Data Transformation

Integration In terms of data. encoding structures. Measurement of attributes. physical attribute. of data

Integration In terms of data. encoding structures. Measurement of attributes. physical attribute. of data remarks naming conventions. Data type format

Time-variant Provides information from historical perspective, e. g. past 5 -10 years Every key

Time-variant Provides information from historical perspective, e. g. past 5 -10 years Every key structure contains either implicitly or explicitly an element of time, i. e. , every record has a timestamp. The time-variant nature in a DW Allows for analysis of the past Relates information to the present Enables forecasts for the future

Non-volatile Data once recorded cannot be updated. Data warehouse requires two operations in data

Non-volatile Data once recorded cannot be updated. Data warehouse requires two operations in data accessing Initial loading of data Incremental loading of data load access

Data Granularity In an operational system, data is usually kept at the lowest level

Data Granularity In an operational system, data is usually kept at the lowest level of detail. In a DW, data is summarized at different levels. Three data levels in a banking data warehouse Daily Detail Monthly Summary Quaterly Summary Account Activity Date Month Amount No. of transactions Deposit/ Withdrawals Deposits Beginning Balance Ending Balance

Operational v/s Information System Features Operational Information Characteristics Operational processing Informational processing Orientation Transaction

Operational v/s Information System Features Operational Information Characteristics Operational processing Informational processing Orientation Transaction Analysis User Clerk, DBA, database professional Knowledge workers Function Day to day operation Decision support Data Content Current Historical, archived, derived View Detailed, flat relational Summarized, multidimensional DB design Application oriented Subject oriented Unit of work Short , simple transaction Complex query Access Read/write Read only

Operational v/s Information System Features Focus Operational Data in Information out No. of records

Operational v/s Information System Features Focus Operational Data in Information out No. of records accessed tens/ hundreds millions Number of users thousands hundreds DB size 100 MB to GB 100 GB to TB Usage Predictable, repetitive Ad hoc, random, heuristic Response Time Sub-seconds Several seconds to minutes Priority High performance, high High flexibility, endavailability user autonomy Metric Transaction throughput Query throughput

Two approaches in designing a DW Top-down approach Bottom-up approach Enterprise view of data

Two approaches in designing a DW Top-down approach Bottom-up approach Enterprise view of data Narrow view of data Inherently architected Inherently incremental Single, central storage of data Faster implementation of manageable parts Centralized rules and control Each datamart is developed independently Takes longer time to build Comparatively less time than a DW Higher risk to failure Less risk of failure Needs higher level of crossfunctional skills Unmanageable interfaces

Bottom Up Approach

Bottom Up Approach

Top Down Approach

Top Down Approach

A Practical Approach-Kimball 1. 2. 3. 4. Plan and Define requirements Create a surrounding

A Practical Approach-Kimball 1. 2. 3. 4. Plan and Define requirements Create a surrounding architecture Conform and Standardize the data Content Implement Data Warehouse as series of supermart one at a time.

An Incremental Approach Sales Distribution Product Glossary Marketing Customer Common Business Metrics. Accounts Common

An Incremental Approach Sales Distribution Product Glossary Marketing Customer Common Business Metrics. Accounts Common Business Rules Common Business Dimensions Finance Inventory Vendors Common Logical Subject Area ERD Individual Architected Data Marts

The Eventual Result Distribution Sales Product Architected Enterprise Foundation Marketing Finance Customer Inventory Accounts

The Eventual Result Distribution Sales Product Architected Enterprise Foundation Marketing Finance Customer Inventory Accounts Vendors Enterprise Data Warehouse

Data Warehouse: Holds multiple subject areas Holds very detailed information Works to integrate all

Data Warehouse: Holds multiple subject areas Holds very detailed information Works to integrate all data sources Does not necessarily use a dimensional model but feeds dimensional models. Data Mart Often holds only one subject area- for example, Finance, or Sales May hold more summarised data (although many hold full detail) Concentrates on integrating information from a given subject area or set of source systems Is built focused on a dimensional model using a star schema.

Data Warehouse verses data marts

Data Warehouse verses data marts