Data Warehouse Design Xintao Wu University of North





























- Slides: 29
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008
Organization • Concepts n Data Warehousing Concepts (Ch 1) • Logical Design n Logical design in data warehouse (Ch 2) • Physical Design n n n n Physical design in data warehouses (Ch 3) Hardware and I/O considerations Parallelism and partitioning in data warehouses Indexes (Ch 6) Integrity constraints (Ch 7) Basic Materialized views (Ch 8) Advanced materialized views Dimensions (Ch 10) 2
Organization • Managing DW environment n n n n Overview of extraction, transformation, and loading Extraction Transportation Loading and transformation Maintaining the DW Change data capture SQLAccess advisor • DW performance n n n n Query rewrite Schema modeling techniques SQL for aggregation in DW SQL for analysis and reporting SQL for modeling OLAP and data mining Using parallel execution 3
What is DW 4
5
6
7
Logical vs. physical design • In the logical design, you look at the logical relationships among the objects. • In the physical design, you look at the most effective way of storing and retrieving the objects as well as handling them from a transportation and backup/recovery perspective. • Your logical design should result in n n a set of entities and attributes corresponding to fact tables and dimension tables A model of operational data from your source into subject-oriented informaiton in your target data warehouse schema. 8
9
10
11
12
Physical Design • • • Logical design can use pen/paper/oracle warehouse builder/oracle designer while physical design is the creation of database with SQL Physical design decisions are mainly driven by query performance and database maaintenance aspects. You need to create n n n Tablespaces Tables and partitioned tables Views u n Integrity constraints u n A schema object that defines hierarchical relationships between columns or column sets. Indexes and partitioned indexes u n In OLTP, they prevent the insertion of invalid data while in DW, they are only used for query rewrite. Dimensions u n A view takes the output of a query and treats it as a table. Views do not require any space in the database Bitmap indexes vs. B-tree indexes. Bitmap indexes are efficient for set-oriented operations. Materialized views u Query results that have been stored in advance. 13
Partition and parallel execution • Range partitioning • Hash partitioning • List partitioning • Composite partitioning 14
Bitmap index 15
16
One dimension table columns joins one fact table 17
18
extension 19
20
Integrity constraints • Unique constraints • NOT NULL constraints • FOREIGN KEY constraints 21
22
Basic materialized views 23
Materialized views with aggregates 24
25
Dimension 26
27
28
29