Chapter 15 Data Warehousing OLAP and Data Mining






































- Slides: 38
 
	Chapter 15 Data Warehousing, OLAP, and Data Mining
 
	Introduction • • Data, data…everywhere! Information…that’s another story! Especially, the right information @ the right time! Data warehousing’s goal is to make the right information available @ the right time • Data warehousing is a data store (eg. , a database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes 2
 
	Introduction • Data warehouses are natural allies for data mining (work together well) • Data mining can help fulfill some of the goal of data warehouses – right information @ the right time • Relational database management systems (RDBMS), such as Oracle, DB 2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing 3
 
	Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of data in support of management's decision making process” - W. H. Inmon 2. “A copy of transaction data, specifically structured for query and analysis” - Ralph Kimball 4
 
	Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a Data Warehouse 5
 
	Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Etc… Copied, organized summarized Data Warehouse Data Mining 6
 
	Data Warehouse q A data warehouse is a copy of transaction data specifically structured for querying, analysis, reporting, and more rigorous data mining q Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system q Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse 7
 
	Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise. 8
 
	Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information 9
 
	Generic Architecture of Data (synonym) Transaction data 10
 
	Transaction (Operational) Data • Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc… • Transactions are the base level of data – the raw material for understanding customer behavior • Unfortunately, operational systems change due to changing business needs • Fortunately, operational systems can usually be changed to support changing business needs • Data warehousing strategies need to be aware of operational system changes 11
 
	Operational Summary Data Summaries are for a specific time period and utilize the transaction data for that time period Other Examples? ? ? 12
 
	Decision Support Summary Data • The data that are used to help make decisions about the business – Financial Data, such as: • Income Statements (Profit & Loss) • Balance Sheets (Assets – Liabilities = Net Worth) – Sales summaries – Other examples? ? ? • Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing? ? ? ) • Generally, it is a bad idea to use the same system for analytic and operational purposes 13
 
	Database Schema • Database schema defines the structure of data, not the values of the data (e. g. , first name, last name = structure; Ron Norman = values of the data) • In RDBMS: – Columns = fields = attributes (A, B, C) – Rows = records = tuples (1 -7) 14
 
	Logical & Physical Database Schema • Describes data in a way that is familiar to business users • Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it 15
 
	Metadata • General definition: Data about data !!! – Examples: • A library’s card catalog (metadata) describes publications (data) • A file system maintains permissions (metadata) about files (data) • A form of system documentation including: – – – Values legally allowed in a field (e. g. , AZ, CA, OR, UT, WA, etc. ) Description of the contents of each field (e. g. , start date) Date when data were loaded Indication of currency of the data (last updated) Mappings between systems (e. g. , A. this = B. that) • Invaluable, otherwise have to research to find it 16
 
	Business Rules • Highest level of abstraction from operational (transaction) data • Describes why relationships exist and how they are applied • Examples: – Need to have 3 forms of ID for credit – Only allow a maximum daily withdrawal of $200 – After the 3 rd log-in attempt, lock the log-in screen – Accept no bills larger than $20 – Others? ? ? 17
 
	General Architecture for Data Warehousing • Source systems • Extraction, (Clean), Transformation, & Load (ETL) • Central repository • Metadata repository • Data marts • Operational feedback • End users (business) 18
 
	Where does OLAP fit in? 19
 
	OLAP Overview • Interactive, exploratory analysis of multidimensional data to discover patterns 20
 
	OLAP Architecture 21
 
	Server Options • Single processor • Symmetric multiprocessor (SMP) • Massively parallel processor (MPP) 22
 
	OLAP Server Options • ROLAP (Relational) • MOLAP (Multidimensional) • HOLAP (Hybrid) 23
 
	OLAP – Online Analytical Processing • A definition: • Data representation is in the form of a CUBE • OLAP goes beyond SQL with its analysis capabilities • Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography 24
 
	OLAP Cube - 1 25
 
	OLAP Cube - 2 26
 
	OLAP Cube - 3 • Star Structure (quite common) 27
 
	OLAP Cube - 4 The Cube 28
 
	OLAP Cube - 5 Three. Dimensional Cube Display 29
 
	OLAP Cube - 6 Six. Dimensional Cube 30
 
	Rotation (Pivot Table) 31
 
	Drill Down 32
 
	OLAP Examples • http: //perso. wanadoo. fr/bernard. lupin/english/example. htm • Excel Pivot Table example (similar to OLAP cube) 33
 
	Sample of OLAP products Just a snippet from http: //www. olapreport. com/Products. Index. htm ; not an endorsement 34
 
	Data Mining versus OLAP 35
 
	Data Mining versus OLAP • OLAP - Online Analytical Processing – Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening 36
 
	Results of Data Mining Include: • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns • Clustering people or things into groups based on their attributes • Associating what events are likely to occur together • Sequencing what events are likely to lead to later events 37
 
	End of Chapter 15 38
