Chapter 15 Data Warehousing OLAP and Data Mining

  • Slides: 38
Download presentation
Chapter 15 Data Warehousing, OLAP, and Data Mining

Chapter 15 Data Warehousing, OLAP, and Data Mining

Introduction • • Data, data…everywhere! Information…that’s another story! Especially, the right information @ the

Introduction • • Data, data…everywhere! Information…that’s another story! Especially, the right information @ the right time! Data warehousing’s goal is to make the right information available @ the right time • Data warehousing is a data store (eg. , a database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes 2

Introduction • Data warehouses are natural allies for data mining (work together well) •

Introduction • Data warehouses are natural allies for data mining (work together well) • Data mining can help fulfill some of the goal of data warehouses – right information @ the right time • Relational database management systems (RDBMS), such as Oracle, DB 2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing 3

Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of

Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and 1. non-volatile collection of data in support of management's decision making process” - W. H. Inmon 2. “A copy of transaction data, specifically structured for query and analysis” - Ralph Kimball 4

Data Warehouse • For organizational learning to take place, data from many sources must

Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a Data Warehouse 5

Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” –

Data Warehouse Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: • “Farmers” – they know • “Explorers” - unpredictable Etc… Copied, organized summarized Data Warehouse Data Mining 6

Data Warehouse q A data warehouse is a copy of transaction data specifically structured

Data Warehouse q A data warehouse is a copy of transaction data specifically structured for querying, analysis, reporting, and more rigorous data mining q Note that the data warehouse contains a copy of the transactions which are not updated or changed later by the transaction system q Also note that this data is specially structured, and may have been transformed when it was copied into the data warehouse 7

Data Mart • A Data Mart is a smaller, more focused Data Warehouse –

Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise. 8

Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information 9

Data Warehouse to Data Mart Data Warehouse Data Mart Decision Support Information 9

Generic Architecture of Data (synonym) Transaction data 10

Generic Architecture of Data (synonym) Transaction data 10

Transaction (Operational) Data • Operational (production) systems create (massive number of) transactions, such as

Transaction (Operational) Data • Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc… • Transactions are the base level of data – the raw material for understanding customer behavior • Unfortunately, operational systems change due to changing business needs • Fortunately, operational systems can usually be changed to support changing business needs • Data warehousing strategies need to be aware of operational system changes 11

Operational Summary Data Summaries are for a specific time period and utilize the transaction

Operational Summary Data Summaries are for a specific time period and utilize the transaction data for that time period Other Examples? ? ? 12

Decision Support Summary Data • The data that are used to help make decisions

Decision Support Summary Data • The data that are used to help make decisions about the business – Financial Data, such as: • Income Statements (Profit & Loss) • Balance Sheets (Assets – Liabilities = Net Worth) – Sales summaries – Other examples? ? ? • Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing? ? ? ) • Generally, it is a bad idea to use the same system for analytic and operational purposes 13

Database Schema • Database schema defines the structure of data, not the values of

Database Schema • Database schema defines the structure of data, not the values of the data (e. g. , first name, last name = structure; Ron Norman = values of the data) • In RDBMS: – Columns = fields = attributes (A, B, C) – Rows = records = tuples (1 -7) 14

Logical & Physical Database Schema • Describes data in a way that is familiar

Logical & Physical Database Schema • Describes data in a way that is familiar to business users • Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it 15

Metadata • General definition: Data about data !!! – Examples: • A library’s card

Metadata • General definition: Data about data !!! – Examples: • A library’s card catalog (metadata) describes publications (data) • A file system maintains permissions (metadata) about files (data) • A form of system documentation including: – – – Values legally allowed in a field (e. g. , AZ, CA, OR, UT, WA, etc. ) Description of the contents of each field (e. g. , start date) Date when data were loaded Indication of currency of the data (last updated) Mappings between systems (e. g. , A. this = B. that) • Invaluable, otherwise have to research to find it 16

Business Rules • Highest level of abstraction from operational (transaction) data • Describes why

Business Rules • Highest level of abstraction from operational (transaction) data • Describes why relationships exist and how they are applied • Examples: – Need to have 3 forms of ID for credit – Only allow a maximum daily withdrawal of $200 – After the 3 rd log-in attempt, lock the log-in screen – Accept no bills larger than $20 – Others? ? ? 17

General Architecture for Data Warehousing • Source systems • Extraction, (Clean), Transformation, & Load

General Architecture for Data Warehousing • Source systems • Extraction, (Clean), Transformation, & Load (ETL) • Central repository • Metadata repository • Data marts • Operational feedback • End users (business) 18

Where does OLAP fit in? 19

Where does OLAP fit in? 19

OLAP Overview • Interactive, exploratory analysis of multidimensional data to discover patterns 20

OLAP Overview • Interactive, exploratory analysis of multidimensional data to discover patterns 20

OLAP Architecture 21

OLAP Architecture 21

Server Options • Single processor • Symmetric multiprocessor (SMP) • Massively parallel processor (MPP)

Server Options • Single processor • Symmetric multiprocessor (SMP) • Massively parallel processor (MPP) 22

OLAP Server Options • ROLAP (Relational) • MOLAP (Multidimensional) • HOLAP (Hybrid) 23

OLAP Server Options • ROLAP (Relational) • MOLAP (Multidimensional) • HOLAP (Hybrid) 23

OLAP – Online Analytical Processing • A definition: • Data representation is in the

OLAP – Online Analytical Processing • A definition: • Data representation is in the form of a CUBE • OLAP goes beyond SQL with its analysis capabilities • Key feature of OLAP: Relevant multi-dimensional views such as products, time, geography 24

OLAP Cube - 1 25

OLAP Cube - 1 25

OLAP Cube - 2 26

OLAP Cube - 2 26

OLAP Cube - 3 • Star Structure (quite common) 27

OLAP Cube - 3 • Star Structure (quite common) 27

OLAP Cube - 4 The Cube 28

OLAP Cube - 4 The Cube 28

OLAP Cube - 5 Three. Dimensional Cube Display 29

OLAP Cube - 5 Three. Dimensional Cube Display 29

OLAP Cube - 6 Six. Dimensional Cube 30

OLAP Cube - 6 Six. Dimensional Cube 30

Rotation (Pivot Table) 31

Rotation (Pivot Table) 31

Drill Down 32

Drill Down 32

OLAP Examples • http: //perso. wanadoo. fr/bernard. lupin/english/example. htm • Excel Pivot Table example

OLAP Examples • http: //perso. wanadoo. fr/bernard. lupin/english/example. htm • Excel Pivot Table example (similar to OLAP cube) 33

Sample of OLAP products Just a snippet from http: //www. olapreport. com/Products. Index. htm

Sample of OLAP products Just a snippet from http: //www. olapreport. com/Products. Index. htm ; not an endorsement 34

Data Mining versus OLAP 35

Data Mining versus OLAP 35

Data Mining versus OLAP • OLAP - Online Analytical Processing – Provides you with

Data Mining versus OLAP • OLAP - Online Analytical Processing – Provides you with a very good view of what is happening, but can not predict what will happen in the future or why it is happening 36

Results of Data Mining Include: • Forecasting what may happen in the future •

Results of Data Mining Include: • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns • Clustering people or things into groups based on their attributes • Associating what events are likely to occur together • Sequencing what events are likely to lead to later events 37

End of Chapter 15 38

End of Chapter 15 38