An overview of Data Warehousing and OLAP Technology
- Slides: 21
An overview of Data Warehousing and OLAP Technology VLDB Conference 1996 Surajit Chaudhuri Umeshwar Dayal Microsoft Reserch, Redmond HP Labs, Palo Alto Presenter : Ha Dong hun
Contents l What is a Data Warehouse ? l OLTP vs OLAP l Back End Tools l Conceptual Model and Front End Tools l Data Warehousing Architecture l Database Design Methodology Database Lab. 2/21
What is a Data Warehouse ? (1/5) l A data warehouse is a “subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decision-making process. ” - W. H. Inmon l In simplest terms Data Warehouse can be defined as collection of Data marts l A data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker to make better and faster decisions Database Lab. 3/21
What is a Data Warehouse ? (2/5) l Characteristic ¡Subject Oriented l Organized around major subjects, such as customer, product, sales l Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing l Provide a simple and concise view around particular subject issues by excluding useless data in the decision support process ¡Integrated ¡Time Variant ¡Nonvolatile Database Lab. 4/21
What is a Data Warehouse ? (3/5) l Characteristic ¡Subject Oriented ¡ Integrated l Constructed by integrating multiple, heterogeneous data sources (e. g. , relational database, flat files, on-line transaction records) l Data cleaning and data integration techniques are applied e. g. , Hotel price : currency, tax, breakfast cost, etc. l When data is moved to the data warehouse, it is converted ¡Time Variant ¡Nonvolatile Database Lab. 5/21
What is a Data Warehouse ? (4/5) l Characteristic ¡Subject Oriented ¡Integrated ¡Time Variant l The time horizon for the data warehouse is significantly longer than that of operational systems • Operational database : current value data • Data warehouse data : provide information from a historical perspective (e. g. , past 5 ~ 10 years) l Every key structure in the data warehouse contain an element of time, explicitly or implicitly. ¡Nonvolatile Database Lab. 6/21
What is a Data Warehouse ? (5/5) l Characteristic ¡Subject Oriented ¡Integrated ¡Time Variant ¡Nonvolatile l A physically separate store of data transformed from the operational environment l Operational update of data does not occur in the data warehouse environment • Does not require transaction processing, recovery and concurrency control mechanisms • Requires only two operations in data processing : initial loading of data and access of data Database Lab. 7/21
OLTP vs. OLAP (1/2) l OLTP (On-line transaction processing) ¡Major task of traditional relational DBMS ¡Day-to-day operations : banking, payroll, accounting, etc. l OLAP (On-line analytical processing) ¡Major task of data warehouse system ¡Data analysis and decision making Database Lab. 8/21
OLTP vs. OLAP (2/2) Features OLTP OLAP Characteristic Operational processing Informational processing User DBA, db professional Knowledge worker Function Day-to-day operations Long-term informational requirements, decision support Data Current Historical View Detailed, flat relational Summarized, multidimensional Access Read/write Mostly read Focus Data in Information out DB size 100 MB ~ GB 100 GB ~ TB Priority High performance High Flexibility, end-user autonomy Metric Transaction throughput Query throughput, response time Database Lab. 9/21
Back End Tools l Back End Tools ¡ Data extraction l From external sources is usually implemented via gateways and standard interfaces. ¡ Data Cleaning l Detect errors in the data and rectify them when possible ¡ Load l Sort, summarize, consolidate, compute views, check integrity and build indices and partitions ¡ Refresh l Propagate the updates from the data sources to the warehouse Database Lab. 10/21
Conceptual Model 1 2 3 4 sum Pr od TV PC MP 3 sum Total annual sales of TV in KOREA U. S. A JAPAN Country uc t Date sum ALL Database Lab. 11/21
es ) Front End Tools (1/4) Korea 2000 USA Q 1 3000 Lo ca tio n( c Seoul 1560 Busan 440 New york 2270 Chicago Time(quarters) L (c oc Time(quarters) oun atio tr n ie s) ¡ Rollup (Drill-up) ¡ Drill-down (Roll-down) ¡ Slice and dice iti l Front End Tools Q 1 730 Q 2 Q 3 PC TV MP 3 Product Q 2 Q 3 PC TV MP 3 Roll-up on location from cities to countries Product Database Lab. 12/21
es ) Front End Tools (2/4) Time(months) Lo ca t io n (c iti Seoul Busan New york Chicago Jan. 130 Feb. 340 Mar. 260 PC Lo ca tio n( c Seoul 1560 Busan 440 New york 2270 Chicago Time(quarters) es ) ¡ Roll-up (Drill-up) ¡ Drill-down (Roll-down) ¡ Slice and dice iti l Front End Tools Q 1 730 Q 2 Q 3 PC TV MP 3 Product TV MP 3 Drill down on time from quarters to months Product Database Lab. 13/21
es ) Front End Tools (3/4) Seoul Lo ca tio n( c Seoul 1560 Busan 440 New york 2270 Chicago Time(quarters) Location(cities) ¡ Rollup (Drill-up) ¡ Drill-down (Roll-down) ¡ Slice and dice iti l Front End Tools Q 1 730 440 610 PC TV MP 3 Q 2 Q 3 Product Busan New york Slice for time Chicago 730 PC 440 610 TV MP 3 = “Q 1” Product Database Lab. 14/21
es ) Front End Tools (4/4) Seoul 1560 Chicago Q 1 Lo ca tio n( c Seoul 1560 Busan 440 New york 2270 Chicago Time(quarters) Lo (c cat Time iti io es n (quarters) ) ¡ Rollup (Drill-up) ¡ Drill-down (Roll-down) ¡ Slice and dice iti l Front End Tools Q 1 730 Q 2 Q 3 PC TV MP 3 Product 730 Q 2 PC MP 3 Product Database Lab. Dice for (Location =“Seoul” or “Chicago”) and (time=“Q 1” or “Q 2”) and (product=“PC” or “MP 3”) 15/21
Data Warehousing Architecture External Sources Operational Databases Monitoring & Administering Tier 1 : Data Warehouse Server Metadata Repository Extract Clean Transform Load Refresh Data Warehouse Tier 2 : OLAP Server Serve Tier 3 : Clients Analysis Query Reporting Data mining Data Marts Data Sources Database Lab. Data Storage OLAP Engine Front-End Tools 16/21
Design Methodology - Star Schema Time dimension table Time_key day_week month quarter year Sales Fact Table Branch dimension table Branch_key Branch_name Branch_type Time_key Product_key Location_key Units_sold Dollars_sold Avg_sales Database Lab. Product dimension table Product_key Product_name brand type supplier_type Location dimension table Location_key street city province country 17/21
Design Methodology - Snowflake Schema Time dimension table Time_key day_week month quarter year Sales Fact Table Branch dimension table Branch_key Branch_name Branch_type Time_key Product_key Location_key Units_sold Dollars_sold Avg_sales Database Lab. Product dimension table Supplier dimension table Product_key Product_name brand type supplier_key Supplier_type Location dimension table Location_key street City_key City dimension table City_key city province country 18/21
Design Methodology - Fact Constellation Time dimension table Time_key day_week month quarter year Branch dimension table Branch_key Branch_name Branch_type Sales fact table Time_key Product_key Branch_key Product dimension table Product_key Product_name brand type supplier_type Shipping fact table Time_key Product_key From_location Units_sold To_location Dollars_sold Location dimension table Shipper_key name Type Location_key Shipper_key Location_key Avg_sales Shipper dimension table Units_shipped Location_key street city province country Database Lab. 19/21
Summary l Data warehouse ¡A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process l Architecture of Date warehouse ¡Consisting of Warehouse servers, front end and back end tools l OLAP operations : roll-up, drill-down, slice, dice l Multidimensional model of Data warehouse ¡ Multidimensional data cube ¡ Star Schema ¡ Snowflake Schema Database Lab. 20/21
Q&A Database Lab. 21/21
- An overview of data warehousing and olap technology
- An overview of data warehousing and olap technology
- Olap
- Data warehouse and olap technology
- Data warehouse and olap technology
- What is kdd process in data mining
- Mining fraud
- Introduction to data warehousing and data mining
- Hive provides data warehousing layer to data over hadoop
- Data warehousing best practices
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Olap meaning
- Types of olap servers in data warehouse
- Independent data mart architecture
- Olap vs oltp in data mining
- Operational and informational data store in data warehouse
- Coffing data warehousing
- Data warehousing components
- Data warehouse project charter
- Human thought process