University of Manitoba Asper School of Business 3500
University of Manitoba Asper School of Business 3500 DBMS Bob Travica Business Analytics and Decision Making OLTP, OLAP & SAP Chapter 9 & SAP Materials Updated 2020
D B S Y S T E M S OLTP vs. OLAP Online Transaction Processing (OLTP) = relational database systems Online Analytical Processing (OLAP) 2 of 26
D B S Y S T E M S OLAP via Data Warehousing MIS 3500 Predefined reports Interactive data analysis Operations’ data Periodical transfers Online Transaction Processing (OLTP): Querying Databases with 3 NF tables Flat files Online Analytical Processing (OLAP); Data warehousing; Data Mining. Usually de-normalized data. 3 of 26
D B S Y S T E M S OLTP & OLAP in Enterprise Systems ² Enterprise Systems (Enterprise Resource Planning Systems) support both. ² Example: An SAP-based system can be a TPS, MIS and DSS for the entire organization. DSS capability draws on data warehousing & cubing. ² Other major software players also offer OLAP: Microsoft, IBM, Oracle… 4 of 26
D B S Y S T E M S Date Warehousing Goals Data warehouse (DW): ² Integrates data from different sources to get a larger picture of business ² Yields multidimensional view of data by creating data cubes ² Allows for statistical analysis on large data sets (test hypotheses on relationships between pieces of data) ² Allows for discovering new relationships by querying cubes or by applying data mining software. 5 of 60
D B S Y S T E M S Extraction, Transformation, and Loading • Preparations performed on data – ETL process Transform/Transport Customers Extract Convert “Client” to “Customer” Load Apply standard product numbers Convert currencies Fix region codes Transaction data from diverse systems. Data warehouse: All data must be consistent. 6 of 26
D B Three-Dimensional View of Data: Cube • Created in a datawarehouse e S Y S T E M S s le a S t Da 3 2 P 1 P 2 P 3 Sales at Location 1 Months in year P 4 L 3 L 2 L 1 Product Logic similar to crosstab query And pivot table. 7 of 26
D B S Y S T E M S Data Hierarchy Year Levels Quarter Roll-up To get higher-level totals Month Week Drill-down To get lower-level details Day 8 of 26
D B Datawarehouse Tables: Star Design Dimension S Y S T E M S Product. ID Price Design is: - Hierarchical (dimension tables have no direct association) - De-normalized (fact table): Price & Quantity inputted to Fact table; Calculated fact Dimension Location. ID Detail Dimension Fact Table * Revenue=Price*Quantity per customer, product, period Sale. Date Quantity Discount Store. ID Inputted from Product, Sale & Customer; most dimensions replicated in Fact table. Revenue broken down by product, sales location, and desired time period (time column/s – day of year, or even smaller; basis for rollup). ² New keys, often combined, usually used in the fact table (e. g. , Sale. Tbl#-Row#). 9 of 26
D B S Y S T E M S Data Warehouse for Tyson Foods Dimension tables (truncated) provide inputs for “facts” (calculated attributes) in Fact table. 10 of 26
D B Datawarehouse Tables: Snowflake Design Product S Y S T E M S Item. ID Description Price Category Design is: - Network-like (dimension tables can connect directly) - Still partly normalized (Sale. Customer-City) OLAPItems Merch. Tbl. Row Sale. Tbl. Row Price Quantity Fact Table • • Sale. ID Sale. Date Customer. ID Discount Sales. Tax Dimension Tables City. ID Zip. Code City State Customer. ID Phone First. Name Last. Name Address Zip. Code City. ID Advantage: Design of Fact table simpler (Customer, City out); Faster processing with a reduced DW schema. 11 of 26
SAP Datawarehouse D B Datawarehouse Cube Details S Y S T E M S Can also be Dimensions More on SAP Datawarehouse 12 of 26
D B S Y S T E M S Multidimensional View of Data – Precursors to DW: Excel Pivot Table Dimension: Last Name, ID Facts (Key Figures, Measures) Dimensions: Quarter, Month Can place data in rows or columns. By grouping months, can instantly get quarterly or monthly totals. 13 of 26
D B Microsoft Platform On Azure Cloud S Y S T E M S 14 of 26
D B S Y S T E M S SQL 99: Multidimensional Data Views SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY CUBE (Category, Month. . . ) Category Bird … Bird Cat … Cat (null) … (null) Month Sum Gc Gm 1 2 135. 00 45. 00 0 0 (null) 1 2 32. 00 607. 50 396. 00 113. 85 0 1 0 0 0 (null) 1 2 3 1293. 30 1358. 8 1508. 94 2362. 68 1 0 0 1 1 1 (null) 8451. 79 1 1 More… 15 of 26
D B S Y S T E M S SQL GROUPING SETS - Hiding Details SELECT Category, Month, Sum FROM … GROUP BY GROUPING SETS ( ROLLUP (Category), ROLLUP (Month), () ) Category Month Bird (null) Cat (null) … (null) 1 (null) 2 (null) 3 … (null) Amount 607. 50 1293. 30 1358. 8 1508. 94 2362. 68 More… 8451. 79 16 of 26
D B S Y S T E M S SQL: RANK Functions ² Calculates and rank orders; useful for analysis SELECT Employee, Sales. Value RANK() OVER (ORDER BY Sales. Value DESC) AS rank DENSE_RANK() OVER (ORDER BY Sales. Value DESC) AS dense FROM Sales ORDER BY Sales. Value DESC, Employee; Employee Sales. Value rank dense Jones 18, 000 1 1 Smith 16, 000 2 2 Blau 16, 000 2 2 Whitt 14, 000 4 3 DENSE_RANK does not skip numbers • Therefore, advances in SQL motivate DBMS vendors to support OLAP and data warehousing. 17 of 26
D B S Y S T E M S Data Mining ² Goal: To discover unknown relationships in the data that can be used to make better decisions. ² Exploratory analysis. ² A bottom-up approach that scans the data to find relationships ² Some statistical routines, but they are not sufficient ª Statistics relies on averages ª Sometimes the important data lies in more detailed pairs ² Supervised by developer vs. unsupervised (self-organizing artificial neural networks) 18 of 26
D B S Y S T E M S 19 of 26
D B S Y S T E M S Common Techniques for Data Mining ² 1. Classification/Prediction ² 2. Association Rules/Market Basket Analysis ² 3. Clustering ² Some based heavily on classical statistics (1), others use specialized mining software (2), yet others combine statistics with specialized mining software (3). ² Currently, these techniques are considered part of data analytics. 20 of 26
D B S Y S T E M S 1. Classification (Prediction) ² Purpose: “Classify” things that are causes and those that are effects. ² Examples ª Which borrowers/loans are most likely to be successful? ª Which customers are most likely to want a new item? ª Which companies are likely to file bankruptcy? ª Which workers are likely to quit in the next six months? ª Which startup companies are likely to succeed? ª Which tax returns are fraudulent? 21 of 26
D B S Y S T E M S Classification Process ² ² ² Clearly identify the outcome/dependent variable. Identify potential variables that might affect the outcome. Use sample data to test and validate the model. Run the data through the model. Regression/correlation analysis, decision trees & tables (below). Income Credit History Job Stability Credit Success 50000 Good Yes 75000 Mixed Bad No 22 of 26
D B S Y S T E M S 2. Association/Market Basket ² Purpose: Determine what events or items go together/co-occur. ² Examples: ª What items are customers likely to buy together? (Business use: Consider putting the two together to increase cross-selling. ) 23 of 26
D B S Y S T E M S Association Challenges ² If an item is rarely purchased, any other item bought with it seems important. So combine items into categories. ² Some relationships are obvious. ª Burger and fries. ² Some relationships are puzzling/meaningless/misleading. ª Hardware store found that toilet rings sell well only when a new store first opened. But what does it mean? ² Caution applies to data analytics: mere relationships without a background knowledge can be misleading. 24 of 26
D B S Y S T E M S 3. Cluster Analysis ² Purpose: Determine groups of people or some entities. ² Examples ª Are there groups of customers? (If so, we could target them; market segmentation) ª Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations. ) ª Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people. ) Large intercluster distance Small intra-cluster distance 25 of 26
D B S Y S T E M S Summary: From Data Warehousing/Mining to Data Analytics EXPLANATORY ANALYTICS PREDICTIVE ANALYTICS 26 of 26
- Slides: 26