An Introduction to Data Warehousing 1 Business Intelligence

An Introduction to Data Warehousing 1

Business Intelligence l Now, if the Estimates made before a Battle indicate Victory, it is because careful calculations show that your conditions are more favorable than those of your enemy; if they indicate defeat , it is because careful calculations show that the favorable conditions for a Battle are fewer. With more careful calculations one can win ; with less one cannot. How much chance of Victory has one who makes no calculations at all !! --- Sun Tzu , The Art of War l Business these days are , war minus shooting. -Anonymous

Course Roadmap • Introduction to Datawarehousing • Difference between Operational System and Data. Warehouse • Emergence of Decision Support Systems • Data. Warehouse Theoretical Architecture • Data. Warehouse Technical Architecture • Data. Warehouse Bus Architecture • Data Modelling concepts • E-R Modelling for OLTP System • Dimensional Modelling for a Datawarehouse • Scheme generation for Datawarehouse • Star Scheme Design • Snowflake Scheme Design • Key aspects in designing the Dimensional Model • Granularity with respect to the Fact Table in the Schemas • Conformed Facts, Dimensions

Course Roadmap • Fact less Fact Tables, Aggregate Fact Tables • Out Trigger Entities in the Schemas • Types of Relationships to be maintained between Facts and Dimensions • Dependencies while generating Physical Scheme for a Data. Warehouse • Case Study of design of Data. Warehouse for an existing ERmodel

Objectives l At the end of this session, you will know : – What is Data Warehousing – The evolution of Data Warehousing – Need for Data Warehousing – OLTP Vs Warehouse Applications – Data marts Vs Data Warehouses – Operational Data Stores – Overview of Warehouse Architecture

Objectives At the end of this lesson, you will know : – Data Warehouse Architectures – Components of Data Warehousing Architecture – An overview of each of the components – Considerations for Data Warehouse Design – Common mistakes in Warehouse designs – An overview of Warehouse on the web

What is a Data. Warehouse ?

What is a Data Warehouse ? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. - WH Inmon - Regarded As Father Of Data Warehousing

Subject-Oriented- Characteristics of a Data Warehouse Operational Leads Quotes Data Warehouse Prospects Customers Orders Regions Products Time Focus is on Subject Areas rather than Applications

Integrated - Characteristics of a Data Warehouse Appl A - m, f Appl B - 1, 0 Appl C - male, female Appl A - balance dec fixed (13, 2) Appl B - balance pic 9(9)V 99 Appl C - balance pic S 9(7)V 99 comp-3 m, f balance dec fixed (13, 2) Appl A - bal-on-hand Appl B - current-balance Appl C - cash-on-hand Current balance Appl A - date (julian) Appl B - date (yymmdd) Appl C - date (absolute) date (julian) Integrated View Is The Essence Of A Data Warehouse

Non-volatile - Characteristics of a Data Warehouse insert change Data Warehouse Operational delete insert load replace change Data Warehouse Is Relatively Static In Nature read only access

Time Variant - Characteristics of a Data Warehouse Operational Current Value data • time horizon : 60 -90 days Data Warehouse Snapshot data • time horizon : 5 -10 years • data warehouse stores historical data Data Warehouse Typically Spans Across Time

Alternate Definitions A collection of integrated, subject oriented databases designed to support the DSS function, where each unit of data is relevant to some moment of time - Imhoff

Alternate Definitions Data Warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End user orientated data access and reporting tools let user get at the data for decision support - Babcock

Evolution of Data Warehousing 1960 - 1985 : MIS Era • Unfriendly • Slow • Dependent on IS programmers • Inflexible • Analysis limited to defined reports Focus on Reporting

Evolution of Data Warehousing 1985 - 1990 : Querying Era Queries that are formulated by the user on the spur of the moment • Adhoc, unstructured access to corporate data • SQL as interface not scalable • Cannot handle complex analysis Focus on Online Querying

Evolution of Data Warehousing 1990 - 20 xx : Analysis Era • Trend Analysis • What If ? • Cross Dimensional Comparisons • Statistical profiles • Automated pattern and rule discovery Focus on Online Analysis

Need for Data Warehousing l Better business intelligence for end-users l Reduction in time to locate, access, and analyze information l Consolidation of disparate information sources l Strategic advantage over competitors l Faster time-to-market for products and services l Replacement of older, less-responsive decision support systems l Reduction in demand on IS to generate reports

Business Queries Typical Business Queries l Which product generated maximum revenue over last two quarters in a chosen geographical region, city wise, relative to the previous version of product, compared with the plan l What percent of customer procures product A with B in a chosen region, broken down by city, season, and income group

OLTP Systems Vs Data Warehouse Remember Between OLTP and Data Warehouse systems users are different data content is different, data structures are different hardware is different Understanding The Differences Is The Key

OLTP Vs Warehouse

OLTP Vs Warehouse

OLTP Vs Warehouse

Processing Power Capacity Planning Time of day Processing Load Peaks During the Beginning and End of Day

Examples Of Some Applications Manufacturers Retailers l Target Marketing l Market Segmentation l Budgeting l Credit Rating Agencies l Financial Reporting and Consolidation l Market Basket Analysis - POS Analysis l Fraud Management l Profitability Management l Event tracking Customers

Do we need a separate database ? l OLTP and data warehousing require two very differently configured systems l Isolation of Production System from Business Intelligence System l Significant and highly variable resource demands of the data warehouse l Cost of disk space no longer a concern l Production systems not designed for query processing

Data Marts l l l Enterprise wide data warehousing projects have a very large cycle time Getting consensus between multiple parties may also be difficult Departments may not be satisfied with priority accorded to them Sometimes individual departmental needs may be strong enough to warrant a local implementation Application/database distribution is also an important factor

Data Marts Subject or Application Oriented Business View of Warehouse » Finance, Manufacturing, Sales etc. » Smaller amount of data used for Analytic Processing » Address a single business process A Logical Subset of The Complete Data Warehouse

Data Warehouse and Data Mart

Data Warehouse and Data Mart

Warehouse or Mart First ?

Different kinds of Information Needs l l l Current Recent Historical Is this medicine available in stock What are the tests this patient has completed so far Has the incidence of Tuberculosis increased in last 5 years in Southern region

Operational Data Store - Definition Can I see credit report from Accounts, Sales from marketing and open order report from order entry for this customer Data from multiple sources is integrated for a subject A subject oriented, integrated, volatile, current valued data store containing only corporate Identical queries may give different results at different times. Supports analysis requiring current data detailed data Data stored only for current period. Old Data is either archived or moved to Data Warehouse

Operational Data Store l Increasingly becoming integrated with the data warehouse l Are nothing but more responsive real time data warehouses l Data Mining has anyway forced Data Warehouses to store transactional level data

OLTP Vs ODS Vs DWH

OLTP Vs ODS Vs DWH

OLTP Vs ODS Vs DWH

Typical Data Warehouse Architecture Data Marts Select EIS /DSS Metadata Query Tools Extract Transform Integrate Maintain Data Warehouse OLAP/ROLAP Web Browsers Operational Systems/Data Preparation Middleware/ API Data Mining Multi-tiered Data Warehouse without ODS

Typical Data Warehouse Architecture Data Marts Metadata Select Extract Transform Integrate ODS Transform Data Warehouse Load Maintain Operational Systems/Data Preparation Multi-tiered Data Warehouse with ODS

Benefits of DWH These capabilities empower the corporate. . . 4 To formulate effective business, marketing and sales strategies. 4 To precisely target promotional activity. 4 To discover and penetrate new markets. 4 To successfully compete in the marketplace from a position of informed strength. 4 To build predictive rather than retrospective models.

Warehouse Architecture - 1 EIS /DSS Metadata Query Tools Select Extract Transform Integrate Data Warehouse OLAP/ROLAP Maintain Web Browsers Operational Systems/Data Preparation Middleware/ API Data Mining Enterprise Data Warehouse

Warehouse Architecture - 2 Metadata EIS /DSS Data Mart Select Metadata Query Tools Extract Transform Data Mart Integrate Maintain Operational Systems/Data OLAP/ROLAP Metadata Data Mart Data Preparation Web Browsers Middleware/ API Data Mining Single Department Data Mart

Warehouse Architecture - 3 Data Marts EIS /DSS Metadata Query Tools Select Extract Transform Data Warehouse Integrate OLAP/ROLAP Maintain Web Browsers Operational Systems/Data Preparation Operational Data Store Middleware/ API Multi-tiered Data Warehouse Data Mining

Data Warehouse Architectures There are three schools of thought about DW architectures – One supports Dimensional Modeling all through (Ralph Kimball) – Second supports ER for Data Warehouse and Star Schemas for Data Marts – Third supports ER model for DW (NCR)

Kimball’s View Operational Systems Staging Area Presentation Server Each Star is a Data Mart and has both summary and detail data LAN Data Warehouse Server Processes • Extract • Scrubbing • Transformation • Load Jobs • Aggregation Jobs • Replication • Monitoring • Management • Meta Data Repository • Meta Data Population • Meta Data Maintenance DW is sum total of all Data Marts DW Bus using Conformed Dimensions Multiple Data Marts With Conformed Dimensions

Inmon’s View Operational Systems Staging Area Data Warehouse Data Marts LAN Data Warehouse Server Processes • Extract • Scrubbing • Transformation • Load Jobs • Aggregation Jobs • Replication • Monitoring • Management • Meta Data Repository • Meta Data Population • Meta Data Maintenance Detail Data in ER format Summarized Data in Star formats Data Warehouse (ER) Feeding Multiple Data Marts (Star Schema)

Components of a Data Warehouse Architecture l Source Databases l Data extraction/transformation/load (ETL) tool l Data warehouse maintenance and administration tools l Data modeling tool or interface to external data models l Warehouse databases l End-user data access and analysis tools

Components of a Data Warehouse Architecture Data Cleansing Tools Source Databases Data Modeling Tool ETL Tool Central Metadata Central Warehouse (RDBMS) ROLAP Engine Data Access and Analysis Tools -Managed Query RDBMS -Desktop OLAP -ROLAP -MOLAP Local meta data Warehouse Admin Tool - Data Mining MDDB Architected Datamarts Warehouse Databases Data Warehouse Is Not Just About Data. . . But Tools Too

Source Databases - Characteristics l Legacy, relational, text or external sources l Designed for high-speed transaction processing l Real-time, current, volatile data l Fast response for larger numbers of concurrent users l Many short transactions l Update-intensive; modifications by row l Inquiry-oriented; access by keys l High integrity, security, recoverability l Source data is often inconsistent and poorly modeled

Data Cleaning Tools l To clean data at the source l Clean up source data in-place on the host l Business rule discovery tools which analyse the source data and write cleaning rules based on lexical analysis and AI techniques l Poorly integrated with data warehousing tools l ETL tools have limited yet adequate data cleansing functionality
- Slides: 50