Virtual University of Pakistan Data Warehousing Lecture8 Denormalization
- Slides: 13
Virtual University of Pakistan Data Warehousing Lecture-8 De-normalization Techniques Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www. nu. edu. pk/cairindex. asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing. com Ahsan Abdullah 1
De-normalization Techniques 2 Ahsan Abdullah
Splitting Tables Table Col. A Col. B Table_v 1 Col. A Col. C Table_v 2 Col. B Col. A Col. C Vertical Split Table_h 1 Col. A Col. B Table_h 2 Col. C Col. A Col. B Col. C 3 Horizontal split Ahsan Abdullah
Splitting Tables: Horizontal splitting… Breaks a table into multiple tables based upon common column values. Example: Campus specific queries. GOAL § Spreading rows for exploiting parallelism. § Grouping data to avoid unnecessary query load in WHERE clause. 4 Ahsan Abdullah
Splitting Tables: Horizontal splitting ADVANTAGE § Enhance security of data. § Organizing tables differently for different queries. § Graceful degradation of database in case of table damage. § Fewer rows result in flatter B-trees and fast data retrieval. 5 Ahsan Abdullah
Splitting Tables: Vertical Splitting § Infrequently accessed columns become extra “baggage” thus degrading performance. §Very useful for rarely accessed large text columns with large headers. § Header size is reduced, allowing more rows per block, thus reducing I/O. §Splitting and distributing into separate files with repeating primary key. § For an end user, the split appears as a single table through a view. Ahsan Abdullah 6
Pre-joining … § Identify frequent joins and append the tables together in the physical data model. § Generally used for 1: M such as masterdetail. RI is assumed to exist. § Additional space is required as the master information is repeated in the new header table. 7 Ahsan Abdullah
Pre-Joining… Master normalized Sale_ID Sale_date Sale_person denormalized 1 M Tx_ID Sale_ID Item_Qty Sale_Rs Tx_ID Sale_date Sale_person Item_ID Item_Qty Sale_Rs Detail 8 Ahsan Abdullah
Pre-Joining: Typical Scenario Typical of Market basket query Join ALWAYS required Tables could be millions of rows Squeeze Master into Detail Repetition of facts. How much? Detail 3 -4 times of master 9 Ahsan Abdullah
Adding Redundant Columns… Table_1 Col. A Table_1’ Col. B Col. A Col. B Table_2 Col. A Col. C Col. D Col. C Table_2 … Col. Z Col. A Col. C Col. D … Col. Z 10 Ahsan Abdullah
Adding Redundant Columns… Columns can also be moved, instead of making them redundant. Very similar to pre-joining as discussed earlier. EXAMPLE Frequent referencing of code in one table and corresponding description in another table. § A join is required. § To eliminate the join, a redundant attribute added in the target entity which is functionally independent of the primary key. 11 Ahsan Abdullah
Redundant Columns: Surprise Note that: § Actually increases in storage space, and increase in update overhead. § Keeping the actual table intact and unchanged helps enforce RI constraint. § Age old debate of RI ON or OFF. 12 Ahsan Abdullah
Derived Attributes: Example Business Data Model #SID Do. B Degree Course Grade Credits Do. B: Date of Birth DWH Data Model #SID Do. B Degree Course Grade Credits GP Age Derived attributes § Calculated once § Used Frequently Age is also a derived attribute, calculated as Current_Date – Do. B (calculated periodically). GP (Grade Point) column in the data warehouse data model is included as a derived value. The formula for calculating this field is Grade*Credits. 13 Ahsan Abdullah
- Modern database design
- Denormalized data example
- Denormalization produces a lower normal form
- What is kdd process in data mining
- Hive provides data warehousing layer to data over hadoop.
- Datamart olap
- Data warehousing olap and data mining
- Data warehouse modeling best practices
- Introduction to data mining and data warehousing
- Introduction to data warehouse
- Coffing data warehousing
- Data warehouse component
- Data warehouse project plan
- 1keydata data warehousing