Data Warehousing and OLAP Hector GarciaMolina Stanford University
- Slides: 58
Data Warehousing and OLAP Hector Garcia-Molina Stanford University 1
Warehousing Growing industry: $8 billion in 1998 l Range from desktop to huge: l u Walmart: 900 -CPU, 2, 700 disk, 23 TB Teradata system l Lots of buzzwords, hype u slice 2 & dice, rollup, MOLAP, pivot, . . . Hector Garcia Molina: Data Warehousing and OLAP
Outline What is a data warehouse? l Why a warehouse? l Models & operations l Implementing a warehouse l Future directions l 3 Hector Garcia Molina: Data Warehousing and OLAP
What is a Warehouse? l Collection of diverse data u subject oriented u aimed at executive, decision maker u often a copy of operational data u with value-added data (e. g. , summaries, history) u integrated u time-varying u non-volatile more 4 Hector Garcia Molina: Data Warehousing and OLAP
What is a Warehouse? l Collection of tools u gathering data u cleansing, integrating, . . . u querying, reporting, analysis u data mining u monitoring, administering warehouse 5 Hector Garcia Molina: Data Warehousing and OLAP
Warehouse Architecture Client Query & Analysis Metadata Warehouse Integration Source 6 Source Hector Garcia Molina: Data Warehousing and OLAP
Why a Warehouse? l Two Approaches: u Query-Driven (Lazy) u Warehouse (Eager) ? Source 7 Source Hector Garcia Molina: Data Warehousing and OLAP
Query-Driven Approach Client Mediator Wrapper Source 8 Wrapper Source Hector Garcia Molina: Data Warehousing and OLAP
Advantages of Warehousing High query performance l Queries not visible outside warehouse l Local processing at sources unaffected l Can operate when sources unavailable l Can query data not stored in a DBMS l Extra information at warehouse l u Modify, summarize (store aggregates) u Add historical information 9 Hector Garcia Molina: Data Warehousing and OLAP
Advantages of Query-Driven l No need to copy data u less storage u no need to purchase data More up-to-date data l Query needs can be unknown l Only query interface needed at sources l May be less draining on sources l 10 Hector Garcia Molina: Data Warehousing and OLAP
OLTP vs. OLAP l OLTP: On Line Transaction Processing u Describes l OLAP: On Line Analytical Processing u Describes 11 processing at operational sites processing at warehouse Hector Garcia Molina: Data Warehousing and OLAP
OLTP vs. OLAP OLTP l l l l 12 Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical OLAP l l l Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users Hector Garcia Molina: Data Warehousing and OLAP
Data Marts Smaller warehouses l Spans part of organization l u e. g. , l Do not require enterprise-wide consensus u but 13 marketing (customers, products, sales) long term integration problems? Hector Garcia Molina: Data Warehousing and OLAP
Warehouse Models & Operators l Data Models u relations u stars & snowflakes u cubes l Operators u slice & dice u roll-up, drill down u pivoting u other 14 Hector Garcia Molina: Data Warehousing and OLAP
Star 15 Hector Garcia Molina: Data Warehousing and OLAP
Star Schema 16 Hector Garcia Molina: Data Warehousing and OLAP
Terms Fact table l Dimension tables l Measures l 17 Hector Garcia Molina: Data Warehousing and OLAP
Dimension Hierarchies s. Type store city region èsnowflake schema èconstellations 18 Hector Garcia Molina: Data Warehousing and OLAP
Cube Fact table view: Multi-dimensional cube: dimensions = 2 19 Hector Garcia Molina: Data Warehousing and OLAP
3 -D Cube Fact table view: Multi-dimensional cube: day 2 day 1 dimensions = 3 20 Hector Garcia Molina: Data Warehousing and OLAP
ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing l MOLAP: Multi-Dimensional On-Line Analytical Processing l 21 Hector Garcia Molina: Data Warehousing and OLAP
Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81 22 Hector Garcia Molina: Data Warehousing and OLAP
Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date 23 Hector Garcia Molina: Data Warehousing and OLAP
Another Example • Add up amounts by day, product • In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prod. Id rollup drill-down 24 Hector Garcia Molina: Data Warehousing and OLAP
Aggregates Operators: sum, count, max, min, median, ave l “Having” clause l Using dimension hierarchy l u average by region (within store) u maximum by month (within date) 25 Hector Garcia Molina: Data Warehousing and OLAP
Cube Aggregation Example: computing sums. . . day 2 day 1 129 rollup drill-down 26 Hector Garcia Molina: Data Warehousing and OLAP
Cube Operators day 2 . . . day 1 sale(c 1, *, *) 129 sale(c 2, p 2, *) 27 sale(*, *, *) Hector Garcia Molina: Data Warehousing and OLAP
Extended Cube * day 2 day 1 28 sale(*, p 2, *) Hector Garcia Molina: Data Warehousing and OLAP
Aggregation Using Hierarchies day 2 day 1 customer region country (customer c 1 in Region A; customers c 2, c 3 in Region B) 29 Hector Garcia Molina: Data Warehousing and OLAP
Pivoting Fact table view: Multi-dimensional cube: day 2 day 1 30 Hector Garcia Molina: Data Warehousing and OLAP
Implementing a Warehouse Monitoring: Sending data from sources l Integrating: Loading, cleansing, . . . l Processing: Query processing, indexing, . . . l Managing: Metadata, Design, . . . l 31 Hector Garcia Molina: Data Warehousing and OLAP
Monitoring Source Types: relational, flat file, IMS, VSAM, IDMS, WWW, news-wire, … l Incremental vs. Refresh l new 32 Hector Garcia Molina: Data Warehousing and OLAP
Periodic snapshots l Database triggers l Log shipping l Data shipping (replication service) l Transaction shipping l Polling (queries to source) l Screen scraping l Application level monitoring l 33 è Advantages & Disadvantages!! Monitoring Techniques Hector Garcia Molina: Data Warehousing and OLAP
Monitoring Issues l Frequency u periodic: daily, weekly, … u triggered: on “big” change, lots of changes, . . . l Data transformation u convert data to uniformat u remove & add fields (e. g. , add date to get history) Standards (e. g. , ODBC) l Gateways l 34 Hector Garcia Molina: Data Warehousing and OLAP
Integration Data Cleaning l Data Loading l Derived Data l Client Query & Analysis Metadata Warehouse Integration Source 35 Source Hector Garcia Molina: Data Warehousing and OLAP
Data Cleaning l l l Migration (e. g. , yen ðdollars) Scrubbing: use domain-specific knowledge (e. g. , social security numbers) Fusion (e. g. , mail list, customer merging) billing DB service DB l 36 customer 1(Joe) merged_customer(Joe) customer 2(Joe) Auditing: discover rules & relationships (like data mining) Hector Garcia Molina: Data Warehousing and OLAP
Loading Data Incremental vs. refresh l Off-line vs. on-line l Frequency of loading l u At l 37 night, 1 x a week/month, continuously Parallel/Partitioned load Hector Garcia Molina: Data Warehousing and OLAP
Derived Data l Derived Warehouse Data u indexes u aggregates u materialized views (next slide) When to update derived data? l Incremental vs. refresh l 38 Hector Garcia Molina: Data Warehousing and OLAP
Materialized Views l Define new warehouse relations using SQL expressions does not exist at any source 39 Hector Garcia Molina: Data Warehousing and OLAP
Processing ROLAP servers vs. MOLAP servers l Index Structures l What to Materialize? l Algorithms l Client Query & Analysis Metadata Warehouse Integration Source 40 Source Hector Garcia Molina: Data Warehousing and OLAP
ROLAP Server l Relational OLAP Server tools utilities ROLAP server Special indices, tuning; Schema is “denormalized” relational DBMS 41 Hector Garcia Molina: Data Warehousing and OLAP
MOLAP Server Multi-Dimensional OLAP Server Ci ty l M. D. tools Product B A milk soda eggs soap 1 utilities 42 multidimensional server Sales 2 3 4 Date could also sit on relational DBMS Hector Garcia Molina: Data Warehousing and OLAP
Index Structures l Traditional Access Methods u B-trees, l hash tables, R-trees, grids, … Popular in Warehouses u inverted lists u bit map indexes u join indexes u text indexes 43 Hector Garcia Molina: Data Warehousing and OLAP
. . . Inverted Lists age index 44 inverted lists data records Hector Garcia Molina: Data Warehousing and OLAP
Using Inverted Lists l Query: u Get people with age = 20 and name = “fred” List for age = 20: r 4, r 18, r 34, r 35 l List for name = “fred”: r 18, r 52 l Answer is intersection: r 18 l 45 Hector Garcia Molina: Data Warehousing and OLAP
. . . Bit Maps age index 46 bit maps data records Hector Garcia Molina: Data Warehousing and OLAP
Using Bit Maps l Query: u Get people with age = 20 and name = “fred” List for age = 20: 1101100000 l List for name = “fred”: 0100000001 l Answer is intersection: 0100000 l Good if domain cardinality small l Bit vectors can be compressed l 47 Hector Garcia Molina: Data Warehousing and OLAP
Join • “Combine” SALE, PRODUCT relations • In SQL: SELECT * FROM SALE, PRODUCT 48 Hector Garcia Molina: Data Warehousing and OLAP
Join Indexes join index 49 Hector Garcia Molina: Data Warehousing and OLAP
What to Materialize? Store in warehouse results useful for common queries l Example: total sales l day 2 . . . day 1 129 materialize 50 Hector Garcia Molina: Data Warehousing and OLAP
Materialization Factors Type/frequency of queries l Query response time l Storage cost l Update cost l 51 Hector Garcia Molina: Data Warehousing and OLAP
Cube Aggregates Lattice 129 city, product day 2 day 1 52 all product date city, date product, date city, product, date use greedy algorithm to decide what to materialize Hector Garcia Molina: Data Warehousing and OLAP
Dimension Hierarchies all state city 53 Hector Garcia Molina: Data Warehousing and OLAP
Dimension Hierarchies all city, product city, date city, product, date state, date state, product, date not all arcs shown. . . 54 Hector Garcia Molina: Data Warehousing and OLAP
Interesting Hierarchy all years weeks quarters months conceptual dimension table days 55 Hector Garcia Molina: Data Warehousing and OLAP
Design What data is needed? l Where does it come from? l How to clean data? l How to represent in warehouse (schema)? l What to summarize? l What to materialize? l What to index? l 56 Hector Garcia Molina: Data Warehousing and OLAP
Tools l Development u l Planning & Analysis u l measure traffic (sources, warehouse, clients) Workflow Management u 57 performance monitoring, usage patterns, exception reporting System & Network Management u l what-if scenarios (schema changes, refresh rates), capacity planning Warehouse Management u l design & edit: schemas, views, scripts, rules, queries, reports “reliable scripts” for cleaning & analyzing data Hector Garcia Molina: Data Warehousing and OLAP
Current State of Industry l Extraction and integration done off-line u Usually l in large, time-consuming, batches Everything copied at warehouse u Not selective about what is stored u Query benefit vs storage & update cost l Query optimization aimed at OLTP u High throughput instead of fast response u Process whole query before displaying anything 58 Hector Garcia Molina: Data Warehousing and OLAP
- Data warehouse and olap technology for data mining
- Chicago time
- An overview of data warehousing and olap technology
- What is data mining and data warehousing
- Crm data warehouse models
- Introduction to data mining and data warehousing
- Javachive
- Oracle data warehouse best practices
- Data warehouse and olap technology
- Data warehouse and olap technology
- Data warehousing and online analytical processing
- Rolap
- Rolap in data warehouse
- Olap vs oltp in data mining
- Introduction to data warehousing
- Greenplum data warehousing
- Data warehouse component
- How to plan a data warehouse project
- 1keydata data warehousing
- Principles of data warehousing
- Introduction to data warehousing
- Concept hierarchy in data warehousing
- Cs 2032
- Data warehouse basic concepts
- Inmon cif
- Data warehouse terminology
- Healthcare data warehouse model
- Front room vs backroom data warehousing
- Hecubas husband
- Olap facts and dimensions
- Oltp and olap in sql
- Slicing and dicing in olap
- Stanford network registration
- Stanford university continuing education
- Stanford university
- Steve jobs commencement address
- Cryptography stanford
- Dr ash from stanford university
- Silicon valley stanford university
- Stanford university
- Dan schonfeld
- Stanford university philosophy department
- Prince hector
- Hector soto rodriguez
- Hernandez garces hector ruben
- Amor de temporada letra
- Hector garcia-molina junghoo cho
- Hector godinez jimenez
- Hector garcia-molina junghoo cho
- Hector sky food stuff trading llc
- H h munro biography
- Que es etimologia de una palabra
- Italo svevo mappa concettuale
- Kurt schmitz
- A small child slides down the four frictionless slides
- Khdudcm extended
- King henry chocolate milk story
- Theme of mrs packletide's tiger
- Hector absi