DW Definition Decision Making Perspective Technical Perspectives Strategies
























- Slides: 24
DW Definition Decision Making Perspective Technical Perspectives Strategies for Building the Warehouse
Decision Making Perspective • • • “… the data, processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable business information to authorized individuals for effective decision making. ” - P. Uhrowczik, Trends in DW, (B&A -28). – Not just a storehouse of data • but a process, an architecture, an environment and infrastructure.
Complete, Timely, and Accurate Complete Information All information, information processing rules and decision rules are known to decision maker Timely information Information is not available during decision making Information can be collected but too much delay/cost Accurate Information Errors during information collection Biases in survey instruments Approximations The correctness of a decision is often known only after it is taken!!
Understandable What is the information about average marks computed over all students all passed students classes of students: course wise, section wise Units: currency, thousands/lakhs/crores Visualization Charts: bar, histogram, graph, list, table , …. . Dashboards
Exercise The law says that income tax must be deducted at source from the salary of employees. What information is needed to decide whether tax is to be deducted and how much tax is to be deducted. Is the information complete, timely, accurate? An employee joins sometime during the financial year. How to compute tax to be deducted at source? How will you make the delivered income tax information understandable?
Technical Perspective I A Data Warehouse is a á subject-oriented á integrated á time-variant á non-volatile collection of data for supporting manager decisions. How to get this collection of data? Ideal as a centrally managed, distributed resource Specialized, replicated database optimized for decisional needs What is meant by “supporting manager decisions”?
Subject-Orientation Operational systems are application oriented financial control materials management HR management They support transaction processing in an application Subject areas might be customers, products, orders, bill of materials enterprise, department, unit Each type of company has its own unique set of subjects
Integrated Data obtained from several, separate sources: standardise unit differences height measured in cm, inch, feet, metres money in dollars, rupees, pounds sterling data representation differences the sex of a person as m/f, 0/1. male/female various coding schemes may be adopted for data type differences the enrollment number in 5 digits, 4 digits
Non-volatile No updates, only periodic refreshment with a new snapshot Unlike operational systems where current value is represented by modify insert delete store
Non-Volatile Data warehouse contents are not lost Let DW 0 be the data warehouse when initially published Let RT be the refresh time for some data in DW 0 Then DWRT is the new DW at RT. DW 0 DWRT 2 RT may be a long time: weeks, months Meanwhile OLTP data sources are changing to reflect each transaction carried out
Time Variant All data in the data warehouse is associated with time stamp time data date hour month Time horizon: the parameters of time represented in the system For data warehouses it is 5 -10 years For operational systems 2 -3 months Data warehouses keep history, operational systems keep current value
Exercise For the income tax problem, identify the four components of the technical definition of the data warehouse What is the subject? What is non-volatile? What is varying with time? What data is getting integrated?
Exercise What aspects of a data warehouse does this definition bring out? What is the contribution of subject? Non-volatile? Varying with time? Integrated? What does the definition not tell us? Structure? Behaviour? Functional aspect?
Technical Perspective II Accepts the broad definition of the earlier perspective but elaborates it: A DW is a copy of transaction data specifically structured for query and analysis. • A new physical structure is built. • Collection of data is well defined. • Properties of the data in the DW are not interesting. • Analysis is for evaluating or analyzing the business process or for meeting a business goal. What is meant by support manager’s decision needs? What are the structural, behavioural, and functional properties of the system?
Achieving Subject Orientation Issues Lead Time for development Subject oriented Circumscribes the domain being considered Model of data Same for both Data mart and warehouse Both multidimensional (Kimball) Logically/physically separate marts possible Different models (Inmon) ER (3 NF relations) for enterprise wide data warehouse Multidimensional for data marts Physically separate marts Subject oriented data marts • Dependent on collection Collection of Data (Single Version of the Truth) • When subject is the entire enterprise then data warehouse
Achieving Subject Orientation: Independent Marts Issues Subject oriented Circumscribes the domain being considered Low Lead Time for development Disparate Platforms No common data model; inconsistencies Consolidation as a separate process DM 1 DM 2 DMn Subject oriented data marts Independent of one another No single version of the truth Conformed dimensions/facts Enterprise DW is achieved by conforming as the need arises
Technical Perspective III A Data Warehouse is a á decision - oriented á integrated á time-variant á non-volatile collection of data Continues to be A centrally managed, distributed resource Specialized, replicated database optimized for decisional needs How do we get the collection of data? What is supported by “decision-oriented”?
Decision - Orientation Subject-oriented data warehouses support information analysis for decision making in the subject-area builds subject silos hinders cross subject, enterprise decisions raises problems of redundancy and inconsistency Decision – orientation: data warehouse per decision builds decision-silos builds complete, cross subject/enterprise wide systems Shows redundancy & inconsistency problems
Achieving Decision Orientation: bottom-up Decision oriented Circumscribes the domain being considered Issues Low Lead Time for development Single Platform Evolving common model: no inconsistencies Consolidation as an integrated process DM 1 DM 2 DM 3 Integrated DMn Decision oriented DW fragments Independent of one another Build by integration Enterprise DW is achieved by integration
Achieving Decision Orientation Assume N existing DW fragments and K DW fragments are to be newly developed. The N requirements granules are unrelated to one another. Requirement granules of the K DW fragments must be compared with one another and with N existing ones. Three Possible Strategies Strategy I: Treat N+K as a pool of DW fragments. Number of combinations = (N+K)C 2 = ((N+K) * (N+K-1))/2 Drawback: Comparing requirements granules of the N DW fragments is a waste.
Achieving Decision Orienation Strategy II: Treat N and K as separate. Compare K requirement granules with each other: KC 2 comparisons. Compare the K DW fragments with each of the N yielding N*K comparisons. Note: No comparison of the N with each other Number of combinations = KC 2 + N*K = (K * (K-1))/2 +N*K Drawback: As a result of KC 2 comparisons, some related DW fragments could have been identified and number to be compared with N is less than K.
Achieving Decision Orienation Strategy III: Assume L remain from comparing K among themselves Number of combinations = KC 2 + L*K = (K * (K-1))/2 +L*N Comparison process is complex: Polynomial time complexity. To speed up delivery of products Make K=1. Therefore KC 2 is undefined L becomes less than or equal to 1 since one DW fragment has to be built. When L =1 Number of combinations = N The expression for the number of DW fragments obtained is Number of DW fragments = (N-M) +1
Achieving Decision Orientation Consolidation Process Compare requirements granule of the new DW fragment with N existing ones Requirements granules of M ≤ N fragments can be merged with the new one If no matching requirements granules then M = 0 and (N+1) DW fragments The new one is unrelated to any of the existing ones. If M = N, that is all the DW fragments match with the new one, All can be combined in one logical data model: we get exactly one DW fragment. Downside K = 1 imposes a constraint on the number of new DW fragments that can be concurrently developed. Delay in taking up the next DW fragment is till the time the comparison is over.
Achieving Decision Orienation