Virtual University of Pakistan Data Warehousing Lecture8 Denormalization

  • Slides: 13
Download presentation
Virtual University of Pakistan Data Warehousing Lecture-8 De-normalization Techniques Ahsan Abdullah Assoc. Prof. &

Virtual University of Pakistan Data Warehousing Lecture-8 De-normalization Techniques Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www. nu. edu. pk/cairindex. asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan@cluxing. com Ahsan Abdullah 1

De-normalization Techniques 2 Ahsan Abdullah

De-normalization Techniques 2 Ahsan Abdullah

Splitting Tables Table Col. A Col. B Table_v 1 Col. A Col. C Table_v

Splitting Tables Table Col. A Col. B Table_v 1 Col. A Col. C Table_v 2 Col. B Col. A Col. C Vertical Split Table_h 1 Col. A Col. B Table_h 2 Col. C Col. A Col. B Col. C 3 Horizontal split Ahsan Abdullah

Splitting Tables: Horizontal splitting… Breaks a table into multiple tables based upon common column

Splitting Tables: Horizontal splitting… Breaks a table into multiple tables based upon common column values. Example: Campus specific queries. GOAL § Spreading rows for exploiting parallelism. § Grouping data to avoid unnecessary query load in WHERE clause. 4 Ahsan Abdullah

Splitting Tables: Horizontal splitting ADVANTAGE § Enhance security of data. § Organizing tables differently

Splitting Tables: Horizontal splitting ADVANTAGE § Enhance security of data. § Organizing tables differently for different queries. § Graceful degradation of database in case of table damage. § Fewer rows result in flatter B-trees and fast data retrieval. 5 Ahsan Abdullah

Splitting Tables: Vertical Splitting § Infrequently accessed columns become extra “baggage” thus degrading performance.

Splitting Tables: Vertical Splitting § Infrequently accessed columns become extra “baggage” thus degrading performance. §Very useful for rarely accessed large text columns with large headers. § Header size is reduced, allowing more rows per block, thus reducing I/O. §Splitting and distributing into separate files with repeating primary key. § For an end user, the split appears as a single table through a view. Ahsan Abdullah 6

Pre-joining … § Identify frequent joins and append the tables together in the physical

Pre-joining … § Identify frequent joins and append the tables together in the physical data model. § Generally used for 1: M such as masterdetail. RI is assumed to exist. § Additional space is required as the master information is repeated in the new header table. 7 Ahsan Abdullah

Pre-Joining… Master normalized Sale_ID Sale_date Sale_person denormalized 1 M Tx_ID Sale_ID Item_Qty Sale_Rs Tx_ID

Pre-Joining… Master normalized Sale_ID Sale_date Sale_person denormalized 1 M Tx_ID Sale_ID Item_Qty Sale_Rs Tx_ID Sale_date Sale_person Item_ID Item_Qty Sale_Rs Detail 8 Ahsan Abdullah

Pre-Joining: Typical Scenario Typical of Market basket query Join ALWAYS required Tables could be

Pre-Joining: Typical Scenario Typical of Market basket query Join ALWAYS required Tables could be millions of rows Squeeze Master into Detail Repetition of facts. How much? Detail 3 -4 times of master 9 Ahsan Abdullah

Adding Redundant Columns… Table_1 Col. A Table_1’ Col. B Col. A Col. B Table_2

Adding Redundant Columns… Table_1 Col. A Table_1’ Col. B Col. A Col. B Table_2 Col. A Col. C Col. D Col. C Table_2 … Col. Z Col. A Col. C Col. D … Col. Z 10 Ahsan Abdullah

Adding Redundant Columns… Columns can also be moved, instead of making them redundant. Very

Adding Redundant Columns… Columns can also be moved, instead of making them redundant. Very similar to pre-joining as discussed earlier. EXAMPLE Frequent referencing of code in one table and corresponding description in another table. § A join is required. § To eliminate the join, a redundant attribute added in the target entity which is functionally independent of the primary key. 11 Ahsan Abdullah

Redundant Columns: Surprise Note that: § Actually increases in storage space, and increase in

Redundant Columns: Surprise Note that: § Actually increases in storage space, and increase in update overhead. § Keeping the actual table intact and unchanged helps enforce RI constraint. § Age old debate of RI ON or OFF. 12 Ahsan Abdullah

Derived Attributes: Example Business Data Model #SID Do. B Degree Course Grade Credits Do.

Derived Attributes: Example Business Data Model #SID Do. B Degree Course Grade Credits Do. B: Date of Birth DWH Data Model #SID Do. B Degree Course Grade Credits GP Age Derived attributes § Calculated once § Used Frequently Age is also a derived attribute, calculated as Current_Date – Do. B (calculated periodically). GP (Grade Point) column in the data warehouse data model is included as a derived value. The formula for calculating this field is Grade*Credits. 13 Ahsan Abdullah