Data Warehousing Data warehousing provides architectures tools for

  • Slides: 25
Download presentation
Data Warehousing Data warehousing provides architectures & tools for business executives to systematically organize,

Data Warehousing Data warehousing provides architectures & tools for business executives to systematically organize, understand & use their data to make strategic decisions. A data warehouse is a collection of integrated databases designed to support a DSS.

Key Features: Ø Subject Oriented: Data warehouse are designed to help you analyze data.

Key Features: Ø Subject Oriented: Data warehouse are designed to help you analyze data. For e. g. To learn more about your company’s sales data, you can build a warehouse that concentrate on sales. ØIntegrated: Integration is closely related to subject orientation. A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as database, flat files &online transaction records. They must resolve problems like naming conflicts & inconsistencies.

ØTime Variant: Data are stored to provide information from historical perspective(e. g. The past

ØTime Variant: Data are stored to provide information from historical perspective(e. g. The past 5 -10 years). It varies from time to time. ØNon Volatile: This means, once entered into warehouse, data should not change. ØSummarized : Operational data are mapped into a decision -usable format. ØMetadata: Data about data are stored. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use.

Need for Data Warehousing ØIndustry has huge amount of operational data. ØKnowledge worker wants

Need for Data Warehousing ØIndustry has huge amount of operational data. ØKnowledge worker wants to turn this data into useful information. ØThis information is used by them to support strategic decision making.

Comparison between OLTP & OLAP systems. S. No. OLTP OLAP 1. Characteristics Operational Processing

Comparison between OLTP & OLAP systems. S. No. OLTP OLAP 1. Characteristics Operational Processing Informational Processing 2. Orientation Transaction Analysis 3. User Clerk , DBA, Database proff. Knowledge worker(e. g. Manager, etc) 4. Function Day to Day operations. Decision support 5. Data Current, up-to-date Historical, maintained over time. 6. DB Size 100 MB to GB 100 GB to TB

6. Priority High performance, flexibilty High availability 7. Unit of Work Short, Simple transaction

6. Priority High performance, flexibilty High availability 7. Unit of Work Short, Simple transaction Complex query 8. Access Read/Write Mostly Read 9. Focus Data in Info out 10. Example Purchasing, banking, etc Budgeting, sales forcasting, etc

ADVANTAGES & DISADVANTAGES OF DATA WAREHOUSES: ADVANTAGES: ØHigh investment: Implementation of DW by an

ADVANTAGES & DISADVANTAGES OF DATA WAREHOUSES: ADVANTAGES: ØHigh investment: Implementation of DW by an organisation requires a huge investment from Rs. 10 lac to Rs. 50 lac. ØIncreased productivity of corporate decision-makers: DW improves the productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data. By transforming data into meaningful information a DW allows business managers to perform more accurate & consistent analysis. ØMost Cost-effective decision making: DW helps to reduce overall cost of the product by reducing the number of channels. ØDW provides retrieval of data without slowing down operational system. ØDW facilitates DSS applications such as trend reports, exception reports & reports that show actual performance versus goals.

DISADVANTAGES: ØUnderestimation of resources of data loading: Sometimes we underestimate the time reqd. To

DISADVANTAGES: ØUnderestimation of resources of data loading: Sometimes we underestimate the time reqd. To extract, clean & load the data into DW. It may take significant proportion of the total development time, although some tools are there which are used to reduce the time & effort spent on this process. ØIncreased end user demands: After satisfying the demands, the user request’s increases. This is b’coz of Increasing awareness of the users on capability & value of DW. ØData Homogenization: The concept of DW deals with similarity of data formats b/w different data sources. Thus, results in to lose of some imp. Value of the data. ØHigh Maintenance: DW is usually not static and hav high cost. : DW are high maintenance systems. Any reorganisation of the business processes & the source systems may affect the DW & result in high maintenance cost.

The three fundamental components that are supported by DW are: Ø Load Manager. ØWarehouse

The three fundamental components that are supported by DW are: Ø Load Manager. ØWarehouse Manager. ØData Access Manager. LOAD MANAGER: The components of DW is responsible for collection of data from operational systems & convert them into usable form for the user. This component is responsible for importing & exporting data from operational systems. It performs the following task: • Identification of Data. • Validation of Data about accuracy. • Extraction of Data from original source • Cleansing of data by eliminating meaningless values & making iit usable. • Data formatting. • Data standardisation by getting them into consistent from.

Warehouse Manager: The warehouse manager is the centre of DW system & is the

Warehouse Manager: The warehouse manager is the centre of DW system & is the DW itself. It is a large, physical database that holds a vast amount of information from a wide variety of sources. The data within DW is organised such that it becomes easy to find, use & update frequently from its sources. Query Manager: Query Manager component provides the end-users with access to the stored warehouse info. Through the use of specialised end-user tools. Tools like Query & reporting, OLAP, graphical & geographical info. Systems.

3 -TIER DATAWAREHOUSE ARCHITECTURE: - Data warehouse adopt a three tier architecture, these are:

3 -TIER DATAWAREHOUSE ARCHITECTURE: - Data warehouse adopt a three tier architecture, these are: 1. Bottom Tier(Data warehouse server) 2. Middle Tier(OLAP server) 3. Top Tier(Front end tools).

BOTTOM TIER: (how data is extracted from sources) ØIt is a warehouse database server

BOTTOM TIER: (how data is extracted from sources) ØIt is a warehouse database server ØData is fed using Back end tools and utilities. ØData extracted using programs called gateways(ODBC, JDBC) ØIt also contains Meta data repository. MIDDLE TIER: The middle tier is an OLAP server that is typically implemented using either ØA relational OLAP (ROLAP) model, that is, an extended relational DBMS that maps operations on multidimensional data to standard relational operations; ØA multidimensional OLAP (MOLAP) model, that is, a specialpurpose server that directly implements multidimensional data and operations.

TOP TIER The top tier is a front-end client layer, which contains query and

TOP TIER The top tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or data mining tools.

DATAWAREHOUSE BACK-ENDTOOLS AND UTILITIES Data warehouse systems use back-end tools and utilities to populate

DATAWAREHOUSE BACK-ENDTOOLS AND UTILITIES Data warehouse systems use back-end tools and utilities to populate and refresh their data. These tools and utilities include the following functions: Data extraction which typically gathers data from multiple, heterogeneous, and external sources. Data cleaning which detects errors in the data and rectifies them when possible. Data transformation which converts data from legacy or host format to warehouse format Load which sorts, summarizes, consolidates, computes views, checks integrity, and builds indices and partitions Refresh which propagates the updates from the data sources to the warehouse

Online Analysis Processing(OLAP): It enables analysts, managers and executives to gain insight information data

Online Analysis Processing(OLAP): It enables analysts, managers and executives to gain insight information data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Data Warehouse Region Product Time

Multidimensional Data �Sales volume as a function of product, month, and region gi on

Multidimensional Data �Sales volume as a function of product, month, and region gi on Dimensions: Product, Location, Time Hierarchical summarization paths Re Industry Region Year Product Category Country Quarter Product City Office Month Day Week

OLAP Operations Drill down: It navigates from less detailed data to more detail data.

OLAP Operations Drill down: It navigates from less detailed data to more detail data. It is reverse of roll up. This adds more detail to given data. Product Category e. g Electrical Appliance Region Sub Category e. g Kitchen Product e. g Toaster Time

OLAP Operations: Roll Up: Performs Aggregation on data cube, either by climbing up or

OLAP Operations: Roll Up: Performs Aggregation on data cube, either by climbing up or by dimension reduction. Product Region Sub Category e. g Kitchen Product e. g Toaster Time

OLAP Operations: Region Slice and Dice: The slice operation performs a selection on one

OLAP Operations: Region Slice and Dice: The slice operation performs a selection on one dimension of a given cube, resulting in sub cube. The dice defines a sub cube by performing a selection on two or more dimension. Product=Toaster Time

OLAP Operations Pivot: is a visualization operation that rotates the data axes in view

OLAP Operations Pivot: is a visualization operation that rotates the data axes in view in order to provide an alternative presentation of data. Product Time Region

DATA MART is generalised term used to describe DW environments that are somehow smaller

DATA MART is generalised term used to describe DW environments that are somehow smaller than others. Data Mart term often used to describe small, single purpose mini DW. “A subset of DW that support the requirements of a particular department or business function” is known as DATA MART. ” It is normally in the form of summary data relating to a particular department or business function. A DW may be constructed as a collection of a subset of DATA MART’s. Usually implemented on low cost departmental servers that are UNIX, Windows/NT based. Depending on sources of data, DM’s can be categorised as: Independent DM’s: sourced from data captured from one or more operational systems or external information providers. Dependent DM’s: sourced directly from enterprise Data warehouse.

Issues associated with development and management of data marts: ØData Mart Functionality: successfully provide

Issues associated with development and management of data marts: ØData Mart Functionality: successfully provide analysis using OLAP and other data mining tools. ØData mart size: User expects faster response time from data marts than from data warehouses. ØData mart load performances: A data mart has to balance 2 critical components ie. End user response time And data loading performance. ØUsers access to data in multiple data marts. ØData mart internet/intranet access: this technology offers users low cost access to data marts and DW’s using web browsers. ØData mart installation: ‘Data mart in a box’ & many products are available that can provide a low cost data marts for an organisation.

Data Warehousing Tools: ØData Warehouse ØSQL Server 2000 DTS ØOracle 8 i Warehouse Builder

Data Warehousing Tools: ØData Warehouse ØSQL Server 2000 DTS ØOracle 8 i Warehouse Builder ØOLAP tools ØSQL Server Analysis Services ØOracle Express Server ØReporting tools ØMS Excel Pivot Chart ØVB Applications