Data Warehouses and Data Mining INFSY 445 Fall
- Slides: 29
Data Warehouses and Data Mining INFSY 445 Fall 2005
Data Warehouses n Relational database systems designed to store large amounts of data n Structure cases problems when complex queries are run n Performance problems n Add indexes to speed retrieval, slows transaction processing
Data Warehouses n Conflict resulted in: 4 Online transaction processing (OLTP) 4 Online analytical processing (OLAP) i. e. , data warehouse
Data Warehouses n Heavily indexed for data retrieval & analysis n Correlations and statistical analysis
Sequential Files n Sequential files 4 Time consuming to search 4 Processing intense to add new row 4 Deleting row done by marking record and removing it during reorganization ID Last. Name First. Name Date. Hired 1 Reeves Keith 1/29/98 2 Gibson Bill 3/31/98 3 Reasoner Katy 2/17/98 4 Hopkins Alan 2/8/98 5 James Leisha 1/6/98 6 Eaton Anissa 8/23/98 7 Farris Dustin 3/28/98 8 Carpenter Carlos 12/29/98 9 O'Connor Jessica 7/23/98 Howard 7/13/98 10 Shields
Binary Search n Like looking through a dictionary or phone book 4 Start at middle 4 Decide which half to continue with 4 Binary search continues to divide data in half until desired record is found
Pointers n Data not stored in tables actually n Data is stored on drive (or RAM) 4 Operating System allocates space 4 Provides location/address • Physical address • Virtual address (VSAM) – Imaginary drive values mapped to physical locations. • Relative address – Distance from start of file
Pointers ID 1 2 3 4 5 6 7 8 9 10 Pointer A 11 A 22 A 32 A 47 A 58 A 63 A 67 A 78 A 83 Last. Name Pointer Carpenter A 67 Eaton A 58 Farris A 63 Gibson A 22 Hopkins A 42 James A 47 O'Connor A 78 Reasoner A 32 Reeves A 11 Shields A 83 Address A 11 A 22 A 32 A 47 A 58 A 63 A 67 A 78 A 83 ID 1 2 3 4 5 6 7 8 9 10 Last. Name Reeves Gibson Reasoner Hopkins James Eaton Farris Carpenter O'Connor Shields First. Name Date. Hired Keith 1/29/98 Bill 3/31/98 Katy 2/17/98 Alan 2/8/98 Leisha 1/6/98 Anissa 8/23/98 Dustin 3/28/98 Carlos 12/29/98 Jessica 7/23/98 Howard 7/13/98 Indexed for ID and Last. Name
Pointers n Indexes allow search on 4 Primary key 4 Build index on any column plan to search • CREATE INDEX command 4 Improve performance n Problems 4 Maintain indexes as rows are added and deleted 4 Process slows performance
Indexes n Index columns used to join tables n Test application with large amounts of data and heavy usage n May need a performance analyzer tool for DBMS 4 Monitor usage 4 Identify bottlenecks 4 Suggest columns to index
Data Warehouses n Create new database for online analytical processing n Data Warehouse 4 Holds copy of transaction data 4 Dedicated to managerial queries 4 Data has many sources 4 Multiple indexes built 4 Data may be de-normalized to improve retrieval performance
Data Warehouse Predefined reports Interactive data analysis Operations data Daily data transfer OLTP Database 3 NF tables Flat files Data warehouse Star configuration
Data Warehouse n Tools available to search warehouse 4 Basic SQL queries 4 Extract data to spreadsheets n Data Mining 4 Automated tools which search for hidden patterns in data 4 Statistical methods like regression analysis 4 Pattern recognition like neural networks
Data Warehouse n Goals 4 Regularly collect , clean, and store data 4 Improve performance of OLAP queries • Denormalizing data n Challenges 4 Set up transfer system to collect and clean data 4 Design storage structure to maximize performance 4 Create data analysis tools to statistically analyze data (Data Mining)
Extraction, Transformation, and Transportation (ETT) Customers Convert Client to Customer Apply standard product numbers Convert currencies Fix region codes Transaction data from diverse systems. Data warehouse: All data must be consistent.
Data Warehouse n Vendors have tools to help automate collection of data n Problem is extracting data without interfering with OLTP operations
OLTP v. OLAP
OLAP n Generally depict data as multidimensional cube n OLAP tools provide interactive capabilities to allow managers to look at data from any perspective 4 Drill Down – looking at details 4 Roll Up – summarized data like averages and totals
Multidimensional Cube y r go e t a C Customer Location Time Sale Date
OLAP Database Design n Measure – numeric measurement of an attribute such as sales value or quantity n Fact Table 4 Measures come from Fact Table 4 Generally a detail table in a database n Sides of Cube 4 Dimension – each attribute chosen from other tables 4 Dimension tables must be related to Fact Table
OLAP Database Design n Two OLAP designs 4 All of the dimension tables are joined directly to the fact table – Star Design 4 At least one dimension table is joined through a second table before being joined to the fact table – Snowflake Design
Star Design Dimension Tables Products Sales Date Fact Table Sales Quantity Amount=Sale. Price*Quantity Customer Location
Snowflake Design Merchandise Sale Item. ID Description Quantity. On. Hand List. Price Category Sale. ID Sale. Date Employee. ID Customer. ID Sales. Tax OLAPItems Sale. ID Item. ID Quantity Sale. Price Amount City. ID Zip. Code City State Customer. ID Phone First. Name Last. Name Address Zip. Code City. ID Dimension tables can join to other dimension tables.
OLAP Data Browsing
OLAP in SQL 99 standard added features to compute basic OLAP-type results n Analytic Functions 4 Standard deviation 4 Variance 4 Covariance 4 Correlation 4 Linear regression
Data Mining Goal: To discover unknown relationships in the data that can be used to make better decisions. Transactions and operations Reports Specific ad hoc questions Queries Aggregate, compare, drill down OLAP Databases Unknown relationships Data Mining
Common Techniques n Classification/Prediction/Regression n Association Rules/Market Basket Analysis n Clustering 4 Data points 4 Hierarchies n Neural Networks
Common Techniques n Deviation Detection n Sequential Analysis 4 Time series events 4 Websites n Textual Analysis n Spatial/Geographic Analysis
Homework n Find at least two commercial OLAP tools and compare the features.
- Dr ocker
- Data warehouses generalize and consolidate data in space.
- Eck
- Chapter 13. business intelligence and data warehouses
- Introduction to databases and data warehouses
- What are the three layers of data warehouse architecture
- Udp to data warehouses
- Mining multimedia databases
- What is the difference between e-workforce and e-commerce?
- Difference between strip mining and open pit mining
- Text and web mining
- Local guide program
- Local guide program
- Strip mining vs open pit mining
- Strip mining vs open pit mining
- What is data mining and data warehousing
- Olap crm
- Olap
- Introduction to data mining and data warehousing
- Is 445 uiuc
- Ntp carga mental
- Kj 445
- Artaxerxes decree 444
- Artaxerxes decree 445
- Integrity constraints types
- Chapter 15 assessment biology answers page 445
- Uiuc ece 445
- Dpr 445 del 2000 slide
- Bisc 445
- Ame 445