Chapter 15 Data Warehousing OLAP and Data Mining
- Slides: 38
Chapter 15 Data Warehousing, OLAP, and Data Mining
Introduction • • Data, data…everywhere! Information…that’s another story! Especially, the right information @ the right time! Data warehousing’s goal is to make the right information available @ the right time • Data warehousing is a data store (eg. , a database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes 2
Introduction • Data warehouses are natural allies for data mining (work together well) • Data mining can help fulfill some of the goal of data warehouses – right information @ the right time • Relational database management systems (RDBMS), such as Oracle, DB 2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing 3
Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of data in support of management's decision making process” - W. H. Inmon 2. “A copy of transaction data, specifically structured for query and analysis” - Ralph Kimball 4
Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a Data Warehouse 5
Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Etc… Copied, organized summarized Data Warehouse Data Mining 6
Data Warehouse q A data warehouse is a copy of transaction data specifically structured for querying, analysis, reporting, and more rigorous data mining q Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system q Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse 7
Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise. 8
Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information 9
Generic Architecture of Data (synonym) Transaction data 10
Transaction (Operational) Data • Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc… • Transactions are the base level of data – the raw material for understanding customer behavior • Unfortunately, operational systems change due to changing business needs • Fortunately, operational systems can usually be changed to support changing business needs • Data warehousing strategies need to be aware of operational system changes 11
Operational Summary Data Summaries are for a specific time period and utilize the transaction data for that time period Other Examples? ? ? 12
Decision Support Summary Data • The data that are used to help make decisions about the business – Financial Data, such as: • Income Statements (Profit & Loss) • Balance Sheets (Assets – Liabilities = Net Worth) – Sales summaries – Other examples? ? ? • Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing? ? ? ) • Generally, it is a bad idea to use the same system for analytic and operational purposes 13
Database Schema • Database schema defines the structure of data, not the values of the data (e. g. , first name, last name = structure; Ron Norman = values of the data) • In RDBMS: – Columns = fields = attributes (A, B, C) – Rows = records = tuples (1 -7) 14
Logical & Physical Database Schema • Describes data in a way that is familiar to business users • Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it 15
Metadata • General definition: Data about data !!! – Examples: • A library’s card catalog (metadata) describes publications (data) • A file system maintains permissions (metadata) about files (data) • A form of system documentation including: – – – Values legally allowed in a field (e. g. , AZ, CA, OR, UT, WA, etc. ) Description of the contents of each field (e. g. , start date) Date when data were loaded Indication of currency of the data (last updated) Mappings between systems (e. g. , A. this = B. that) • Invaluable, otherwise have to research to find it 16
Business Rules • Highest level of abstraction from operational (transaction) data • Describes why relationships exist and how they are applied • Examples: – Need to have 3 forms of ID for credit – Only allow a maximum daily withdrawal of $200 – After the 3 rd log-in attempt, lock the log-in screen – Accept no bills larger than $20 – Others? ? ? 17
General Architecture for Data Warehousing • Source systems • Extraction, (Clean), Transformation, & Load (ETL) • Central repository • Metadata repository • Data marts • Operational feedback • End users (business) 18
Where does OLAP fit in? 19
OLAP Overview • Interactive, exploratory analysis of multidimensional data to discover patterns 20
OLAP Architecture 21
Server Options • Single processor • Symmetric multiprocessor (SMP) • Massively parallel processor (MPP) 22
OLAP Server Options • ROLAP (Relational) • MOLAP (Multidimensional) • HOLAP (Hybrid) 23
OLAP – Online Analytical Processing • A definition: • Data representation is in the form of a CUBE • OLAP goes beyond SQL with its analysis capabilities • Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography 24
OLAP Cube - 1 25
OLAP Cube - 2 26
OLAP Cube - 3 • Star Structure (quite common) 27
OLAP Cube - 4 The Cube 28
OLAP Cube - 5 Three. Dimensional Cube Display 29
OLAP Cube - 6 Six. Dimensional Cube 30
Rotation (Pivot Table) 31
Drill Down 32
OLAP Examples • http: //perso. wanadoo. fr/bernard. lupin/english/example. htm • Excel Pivot Table example (similar to OLAP cube) 33
Sample of OLAP products Just a snippet from http: //www. olapreport. com/Products. Index. htm ; not an endorsement 34
Data Mining versus OLAP 35
Data Mining versus OLAP • OLAP - Online Analytical Processing – Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening 36
Results of Data Mining Include: • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns • Clustering people or things into groups based on their attributes • Associating what events are likely to occur together • Sequencing what events are likely to lead to later events 37
End of Chapter 15 38
- Olap
- Chicago time
- An overview of data warehousing and olap technology
- What is data mining and data warehousing
- Crm data warehouse models
- Introduction to data mining and data warehousing
- Olap vs oltp in data mining
- Mining complex types of data in data mining
- Mining multimedia databases in data mining
- Difference between strip mining and open pit mining
- Difference between text mining and web mining
- Hive provides data warehousing layer to data over hadoop
- Best practices data warehousing
- Data warehouse and olap technology
- Data warehouse and olap technology
- Strip mining vs open pit mining
- Strip mining vs open pit mining
- Starnet query model in data warehouse
- Olap architecture diagram
- 3 layers of data warehouse architecture
- Difference between operational and informational data
- Coffing data warehousing
- Data warehouse component
- How to plan a data warehouse project
- 1keydata data warehousing
- Principles of data warehouse
- Introduction of data warehouse
- Concept hierarchy in data warehousing
- Cs 2032
- Basic concept of data warehousing
- Data warehousing components
- Data warehouse terminology
- Healthcare data warehousing
- Kimball bus architecture
- Olap facts and dimensions
- Oltp and olap in sql
- Dmql
- Data reduction in data mining
- What is missing data in data mining