DW A Tactical View Developed by Dr Eddie

DW: A Tactical View Developed by: Dr Eddie Ip Modified by: Dr Arif Ansari 1

Outline • • Brief review Evolution of DW Architectural model of DW Putting a DW in place 2

Review: DW • Snapshot: DW takes a snap-shot of operation & stores it away update Jan 01 update Dec 00 Nov 00 Current 1/18/01 Update snapshot 3

Review: DW • Read-only • For analytical processing • For company-wide usage (Rising Tide strategy) 4

Review: Operational goals of DW • Provide business users with access to data • Provide one version of the truth • Record the past accurately • Slice and dice efficiently thro’ data • Provide a unified view of customers 5

Evolution of DW as a decision support tool • Long history of analytic system • 1970 DSS = decision support systems • 1980 EIS = executive information systems – Reporting system for senior managers 6

EIS example • Frito-Lay 7

Pre-DW DSS • Lack DB component • Extract programs subset files (predetermined criteria) – Problems: reliable joins, synchronization, credibility, consistency • Process controlled by IT, not business users • Silos (one OLTP, one DSS) 8

Modern DSS Example • Stock Screen – Used by security analysts to inform & support business decision 9

Modern DSS Example • A simple web-based mutual fund screen – Charles Schwab http: //www. schwab. com/Schwab. NOW/navigatio n/main. Frame. Set/0, 4528, 527, 00. html 10

Java-based DSS • Java-based DSS (Comshare) • http: //decisionweb. comshare. com/deciweb /tr/trmain 2. asp? App=corporate%20 review 11

DW as a DSS • OLTP bring data in • DSS draw information out Data DW Information to support decision 12

Basic Architectural model • An architecture is a set of rules or structures providing a framework for the overall design of a system. – Poe, Building a DW for Decision Support 13

Architectural Model • Data flow through the DW pipe of 6 layers – – – Transaction layer Propagation layer Database layer Middleware layer Application layer Presentation layer 14

OD 1 Six-layer DW Architecture AP 1 OD 2 AP 2 OD 3 AP 3 OD 4 AP 4 OD 5 Transaction Propagation Database Middleware Application 15 Presentation

Transaction layer – Validate = is data up-to-date and accurate? ID invalid, missing, duplicate values • even name may have different format • First/Last, Last/Middle/First, …. • Define default for missing value – Map source to target = which field in operation data to which field in DW 16

Transaction layer • Data – Flat file (text or ASCII file) – RDBMS file – Proprietary file (. dat) 17

Transaction layer (Tips to handle data) • RDBMS files > SQL / extraction tools with GUI (often available in RDBMS) • One Big flat file – Large volume (e. g. , click stream, call transaction) • Read into scalable RDBMS • Scripting or C/C++ program – Small volume • Statistical tools (SPSS, SAS, JMP, Splus) • Specialized tools such as spreadsheet, OLAP tools • Desktop RDBMS such as MS Access 18

Propagation layer – Most common option is asynchronous propagation (versus synchronous: changes occurring in transaction system synchronously transported to DW) – Transfer and load • Time for loading depends on a factors such as number of indexes 19

Database layer • DW is not just the database layer! – Integrated & transformed BEFORE – Analysis & presentation AFTER • Parallel processing capability often required – Symmetric Multi-processor(SMP) – Massively parallel processing (MPP) • Enhancement http: //houseandhome. msn. com/pickaplace/nf_Overview. aspx 20

Middleware layer • Software that make access to database layer and application layer possible (= the pipes) • LAN, WAN • Client server model : logical and physical separation of data from the application • Thin client model : Internet/Intranet 21

Application layer • Decision support applications – – – Presentational systems (pre-defined reports) Interrogative systems (SQL) Simulation systems (What-if analysis) Functional systems (e. g. segmentation analysis) Automatic systems (e. g. credit approval expert system) • Constrained by organizational design – System geographical oriented? Product oriented? 22

Presentation layer • Interface – Command-line (e. g. unix) – Menu-driven (little interaction) – Query language (e. g. SQL) – Graphical (windows, pull-down menu, buttons, interactive) – Groupware (Lotus Notes, Window based products) 23

Presentation layer – Multmedia (sound , video, animation) – Hypertext (html, xml) 24

Presentation layer • Security – Methods to enforce security • Views = seeing selected rows or columns • Access control = DBA Grant/Revoke user’s privilege’s access to data files • Admin control – Accesses based on roles (group of users) – Encryption – Firewall • Audit = track trail 25

Putting a DW in place • • Plan and launch Collect business requirements Implementation Maintenance, support, and evolution 26

Architecture Data Modeling Plan Collect Business Requirements Selecting technology Data staging Evolution Analytical Applications 27

Plan and launch • Sponsorship – ROI and risk – Expenses: hardware & network upgrade (25 -60 %), software (10 -25%), services (10 -35%), internal staff (5 -15%) – 3 -6 m ~ 1 -3 m – Figures from Humphrey, Hawkins, and Dy (1999) – Cooperation across departments (enterprise DW) 28

Plan>Implement • Business requirement collection – A DW aims at solving business, not technical problems. • Architecture & design • Infrastructure = technologies, platforms & components necessary to support the chosen architecture • Data • Analytical applications 29

Plan>Implement • Gather business requirements – Interview key end users • • Executives Managers Analysts Front-line workers 30

Implementation • Design – What to include – How are data formatted and stored – Physical and logical models – Meta-data management • Other implementation issues (e. g. , data model) discussed later in course 31

Other Broad Issues • How do I measure results? (If you’re not measuring it, you’re not managing it) • Build or buy? • Data mart vs Enterprise DW • Ownership • What can be outsourced? 32

Types of DW • Plain-vanilla/ generic DW • Data mart = specialized DW built for a community of users (e. g. , finance) • Enterprise DW = DW built for entire company • Operational data store (ODS) = data store to see instant snapshot, e. g, customer call center • (reference: Westerman, p. 10) 33

Types of DW • Federated DW – Hybrid of Top down/Bottom up approach – Top down= built Enterprise DW – Bottom Up= Each unit/company built its own DW or data mart – FDW: set of common rules (captured as meta data) for each quasi independent DW http: //www. dmreview. com/master. cfm? Nav. ID= 198&Ed. ID=1953 34

Some Trends • An encompassing enterprise DW is a myth – Federated DW, data mart • Quick ROI (small victories) often useful to sustain DW • Big, fat, expensive apps that required long development time are replaced by smaller, more flexible, less expensive turnkey apps • Mid-size & smaller companies are beginning to build DW (Declining h/w cost + Microsoft factor) 35

The Modern Data Warehouse • A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting • Note that the data warehouse contains a copy of the transactions. These are not updated or changed later by the transaction system. • Also note that this data is specially structured, and may have been transformed when it was placed in the warehouse 36

Data Warehouse Roles and Structures The DW has the following primary functions: • It is a direct reflection of the business rules of the enterprise. • It is the collection point for strategic information. • It is the historical store of strategic information. • It is the source of information later delivered to data marts. • It is the source of stable data regardless of how the business processes may change. 37

Position of the Data Warehouse Within the Organization 38

Data Marts • A data mart is a smaller, more focused data warehouse. It reflects the business rules of a specific business unit. • The data mart does not need to cleanse its data because that was done when it went into the warehouse. • It is a set of tables for direct access by users. • These tables are designed for aggregation. • It typically is not a source for traditional statistical analysis. 39

Position of the Data Mart Within the Organization Data Delivery Data Mart Decision Support Information 40

What Can a Data Warehouse Do? Some of the benefits of a DW are: • Immediate information delivery • Data integration from across and even outside the organization • Future vision from historical trends • Tools for looking at data in new ways • Freedom from IS department resource limitations (you don’t need programmers to use a data warehouse) 41

Examples of Common DW Applications Sales Analysis Determine real-time product sales to make vital pricing and distribution decisions. Analyze historical product sales to determine success or failure attributes. Evaluate successful products and determine key success factors. Use corporate data to understand the margin as well as the revenue implications of a decision. Rapidly identify a preferred customer segments based on revenue and margin. Quickly isolate past preferred customers who no longer buy. Identify daily what product is in the manufacturing and distribution pipeline. Instantly determine which salespeople are performing, on both a revenue and margin basis, and which are behind. Financial Analysis Compare actual to budgets on an annual, monthly and month-to-date basis. Review past cash flow trends and forecast future needs. Identify and analyze key expense generators. Instantly generate a current set of key financial ratios and indicators. Receive near-real-time, interactive financial statements. Human Resource Analysis Evaluate trends in benefit program use. Identify the wage and benefits costs to determine company-wide variation. 42

What Does All This Mean? • On a daily basis, organizations turn to their data warehouses to answer a limitless variety of questions. • Nothing is free, however, and these benefits do come with a cost. • The value of a data warehouse is a result of the new and changed business processes it enables. • There are limitations, though. A DW cannot correct problems with the data, although it may help to clearly identify them. 43

Comparison of Typical DW Costs and Benefits Costs Hardware, software, development personnel and consultant costs. Operational costs like ongoing systems maintenance. Benefits Added Revenue Will the new (business objective) process generate new customers (what is the estimated value? ) Will the new (business objective) process increase the buying propensity of existing customers (by how much? ) Is the new process necessary to ensure that the competition doesn't offer a demanded service that you can't match? Reduced costs What costs of current systems will be eliminated? 44

The Cost of Warehousing Data • Expenditures can be categorized as one-time initial costs or as recurring, ongoing costs. • The initial costs can further be identified as for hardware or software. • Expenditures can also be categorized as capital costs (associated with acquisition of the warehouse) or as operational costs (associated with running and maintaining the warehouse) 45

Expenditures Associated with Building a DW Recurring Costs Capital Hardware maintenance Software maintenance Terminal analysis Middleware One-Time Costs Hardware Disk CPU analysis Network Terminal analysis Software DBMS Terminal Middleware Network Log utility Processin g Metadata Infrastruct ure Operational Ongoing refreshment Integration transformation Data model maintenance Record identification maintenance Metadata infrastructure maintenance Archival of data Data aging within the DW Integration/transformation processing specification Metadata infrastructure population System of record definition Data dictionary language definition Network transfer definition CASE/Repository interface Initial data warehouse population Data model definition Database design definition 46

Cost Are Highly Variable • A company that spends less money for their data warehouse is often happier with it. • The main justification for the development expense is that a DW reduces the cost of accessing the information owned by the organization. • Since information has to be retrieved just once (when it is placed in the warehouse), DW users see a lower cost on each report generated. 47

Typical Multidatabase Report and Screen Generation Data download and transformation contribute to retrieval costs for every report or screen generated Source System A Source System B Source System C Source System D 48

Typical DW Report and Screen Generation Data upload and transformation costs occur just once. Retrieval costs are lower. Source System A Source System B Organizational Data Warehouse Source System C Source System D 49

Summary • At a tactical level, DW can be viewed as a DSS • 6 -layer architecture model of DW • Collecting business requirements is an important first step in building a DW • General DW proliferation : without the analytics, you’re at a competitive disadvantage 50

Glossary • • • DW : Data Warehousing BI: Business Intelligence DM: Data Mining DBMS: Database management System RDBMS: Relational Database Management System • GUI: Graphical User Interface • OLAP: Online Analytical Processing 51

Glossary • • OLTP: Online Analytical Processing DSS: Decision Support System EIS : Executive Information System ODS: Operational Data Store 52
- Slides: 52