Normalisation Relation ABCDEF Relation 1 AB 1 NF

  • Slides: 25
Download presentation
Normalisation Relation ABCDEF Relation 1 AB 1 NF? ? 3 N F? F N

Normalisation Relation ABCDEF Relation 1 AB 1 NF? ? 3 N F? F N 2 Help me Codd!! Reading: Connolly and Begg 13 & 14 (4 th ed), Relation 2 A* C D E* Relation 3 EF

Normalisation From this… …to this In 3+ easy(? ) steps

Normalisation From this… …to this In 3+ easy(? ) steps

What is normalisation? l A method for database design – – – l Takes

What is normalisation? l A method for database design – – – l Takes a set of attributes and derives the relational model – l By separating out the required tables Completely different approach to ERM – l Theory examines how “good” is a schema? Transform non-normalised schemas Minimise storage But should get the same result A minimum of 3 steps are used: l l For each stage, the normal form gets stronger (i. e. removes redundancy) so less open to update anomalies All based on functional dependencies

Functional Dependency l l Underpins normalisation process If every value of column A uniquely

Functional Dependency l l Underpins normalisation process If every value of column A uniquely determines the value in column B, then – – l B is functionally dependent on A (B depends on A) A determines B, or, formally, A B (A is called the determinant) For example, – – Emp. ID Age, Dept (A B, C) Employee ID, Project Role (X, Y Z) Note multiple attributes are often involved Emp. ID Project Age Dept Dsize Budget Role

Rules for functional dependency l A B does NOT automatically mean B A –

Rules for functional dependency l A B does NOT automatically mean B A – E. g. student ID name but not name ID l Transitive dependency: If A B and B C then A C l Many other rules – – l E. g. if X, Y Z but X Z also In this case Z is partially dependent on X, Y “Transitive” and “partial” dependency are two key concepts of the normalisation process

A Question for you! Emp. ID Project A B C D Emp. ID Project

A Question for you! Emp. ID Project A B C D Emp. ID Project E 1 E 2 P 2 P 1 P 2 Age 33 33 34 34 Dept Dsize Budget Role D 2 D 5 10 10 10 20 100 200 100 Analyst Prog. Analyst Which functional dependency is violated by the data?

Unnormalised Form l Relation contains: – non-atomic attribute values ID Employee 1 Grey 2

Unnormalised Form l Relation contains: – non-atomic attribute values ID Employee 1 Grey 2 Brown 3 White 4 Black Salary 31000 35000 55000 47000 Project A B, C A, C Violation of 1 NF non-atomic values

First Normal Form ID Employee Salary 1 Grey 31000 A 2 Brown 35000 3

First Normal Form ID Employee Salary 1 Grey 31000 A 2 Brown 35000 3 White 55000 4 Black 47000 redundancy ID Employee Salary 1 Grey 31000 2 Brown 35000 3 White 55000 4 Black 47000 Project 10 B C A C Budget Permits only single (atomic) attribute values 5 5 5 10 5 Repeating ID (fk) 1 2 2 3 3 3 4 4 Project A B C A C Budget 10 5 5 5 10 5 Remove Repeating Group along with primary key from other Table

Second Normal Form l l Full Functional Dependency (FFD) X Y is FFD –

Second Normal Form l l Full Functional Dependency (FFD) X Y is FFD – l X Y is partially dependent – l if removal of attribute from X leaves the dependency intact 2 NF test – l if removal of any attribute from X removes the dependency involves testing for partial dependency on the PK (therefore PK MUST be composite to test for 2 NF) Relation R is in 2 NF if: – every non-primary-key attribute in R is FFD on the primary key of R

Emp. ID Project l l Age Dept Dsize Budget Role So which FD’s are

Emp. ID Project l l Age Dept Dsize Budget Role So which FD’s are violating 2 NF? “Second Normalised” by: – removing non-primary-key attributes and forming a FFD on appropriate part of primary key {Emp. ID , Age, Dept , Dsize} {Project , Budget} {Emp. ID*, Project*, Role} 2 NF

Third Normal Form l l Remove Transitive Dependency Conditions – A non-primary-key attribute Z

Third Normal Form l l Remove Transitive Dependency Conditions – A non-primary-key attribute Z is transitively dependent on primary key X if: l X Y; Y Z (Y attribute provides the transition to the PK) A [Emp. ID* Project* B [Emp. ID Age Dept Role] Dsize] Budget] C [Project D None of the above Which of the above could have transitive dependency?

Here is an un-normalised Table Ord# 1 1 2 2 2 3 Date Cust#

Here is an un-normalised Table Ord# 1 1 2 2 2 3 Date Cust# 12/1/01 1 13/1/01 2 13/1/01 1 Name Jones Black Jones Prod# Desc 1 Disk 2 CD 3 Mouse Qty 3 5 1 1 Supplier X Y X X Tel 101 223 101

Normalise it to 1 NF Ord# Date Cust# Name Prod# Desc Qty 1 1

Normalise it to 1 NF Ord# Date Cust# Name Prod# Desc Qty 1 1 2 2 2 3 Jones Black Jones 12/1/01 13/1/01 Ord# Date 1 2 3 12/1/01 13/1/01 1 1 2 2 2 1 Cust# 1 2 1 Name Jones Black Jones 1 2 3 3 Disk CD Mouse Supplier Tel 3 5 1 1 X Y X X 101 223 101 fk Ord# Prod# Desc Qty Supplier Tel 1 1 2 2 2 3 3 5 1 1 1 2 3 3 Disk CD Mouse X Y X X 101 223 101

Ord# Date 1 2 3 12/1/01 13/1/01 Cust# 1 2 1 Name Ord# Prod#

Ord# Date 1 2 3 12/1/01 13/1/01 Cust# 1 2 1 Name Ord# Prod# Desc Qty Supplier Tel Jones Black Jones 1 1 2 2 2 3 3 5 1 1 Already in 2 NF Prod# Desc 1 2 3 3 Disk CD Mouse X Y X X 101 223 101 Supplier Tel Disk X CD Y Mouse X 101 223 101 Now we normalise this to 2 NF remembering to test on the PK for any partial dependency Ord# Prod# Qty 1 1 2 2 2 3 1 2 3 3 fk fk 3 5 1 1

So, any transitive dependency? Ord# Date 1 2 3 12/1/01 13/1/01 Cust# 1 2

So, any transitive dependency? Ord# Date 1 2 3 12/1/01 13/1/01 Cust# 1 2 1 Name Jones Black Jones Prod# Desc Ord# Prod# Qty 1 1 2 2 2 3 1 2 3 3 fk fk 3 5 1 1 1 2 3 Supplier Tel Disk X CD Y Mouse X 101 223 101

Yes! But not in all ……………. Ord# Date 1 2 3 Cust# 12/1/01 13/1/01

Yes! But not in all ……………. Ord# Date 1 2 3 Cust# 12/1/01 13/1/01 1 2 1 Name Prod# Desc Jones Black Jones 1 2 3 Supplier Tel Disk X CD Y Mouse X 101 223 101 Ord# Prod# Qty Cust# Name 1 2 Jones Black Ord# Date 1 2 3 12/1/01 13/1/01 1 1 2 2 2 3 Cust# (fk) 1 2 1 2 3 3 3 5 1 1 1 OK! 1 Supplier Tel X Y Prod# Desc 1 2 3 101 223 Supplier (fk) Disk X CD Y Mouse X

Final Decomposition Ord#{fk} Prod#{fk} Qty 1 1 2 2 2 3 1 2 3

Final Decomposition Ord#{fk} Prod#{fk} Qty 1 1 2 2 2 3 1 2 3 3 3 5 1 1 Ord# Date 1 2 3 12/1/01 13/1/01 1 2 3 Supplier (fk) Disk X CD Y Mouse X 1 2 1 Cust# Name 1 2 Prod# Desc Cust# (fk) Jones Black Supplier Tel X Y 101 223 Now in 3 NF

The underlying E-R Model …. . Ord# Date Cust# Name Prod# Desc Qty 1

The underlying E-R Model …. . Ord# Date Cust# Name Prod# Desc Qty 1 1 2 2 2 3 Jones Black Jones 12/1/01 13/1/01 1 1 2 2 2 1 1 2 3 3 Disk CD Mouse 3 5 1 1 Supplier Tel X Y X X 101 223 101 makes Customer Order 1. . 1 0. . * has How many 0. . * despatches tables would Product Supplier you get from 1. . * 1. . 1 mapping?

So Normalisation to 3 NF is Normal!! l l l Remember, 2 NF and

So Normalisation to 3 NF is Normal!! l l l Remember, 2 NF and 3 NF disallow partial and transitive dependencies respectively on the PK, otherwise they are open to update anomalies But …. . even at 3 NF, a relation may be open to update anomalies on rare occasions due to redundancy too So we look briefly at these – – Boyce-Codd 4 NF

Boyce-Codd NF l l Is a stronger normalised form then 3 NF Definition: A

Boyce-Codd NF l l Is a stronger normalised form then 3 NF Definition: A relation is in BCNF, if and only if, every determinant is a candidate key And remember that a candidate key is any key that could become the PK of the relation (i. e. there may be competition for it!) Potential to violate BCNF comes from: – – A relation containing at least 2 composite candidate keys Or candidate keys overlapping (i. e. they have at least one attribute in common)

BCNF Example l l l Consider the candidate keys for: client. No interview. Date

BCNF Example l l l Consider the candidate keys for: client. No interview. Date interview. Time staff. No room. No CR 76 13/5/08 10. 30 SG 5 G 101 CR 56 13/5/08 12. 00 SG 5 G 101 CR 74 13/5/08 12. 00 SG 37 G 102 CR 56 1/7/08 10. 30 SG 5 G 102 FD 1 {PK}: client. No, interview. Date interview. Time, staff. No, room. No FD 2 {CK}: staff. No, interview. Date, interview. Time client. No FD 3 {CK}: room. No, interview. Date, interview. Time staff. No, client. No FD 4: staff. No, interview. Date room. No PK is primary key and CK is candidate key. But what about FD 4? It is not a CK Adapted from Connolly and Begg, 2005, 4 th ed. Page 420

So new decomposition? client. No interview. Date* interview. Time staff. No* CR 76 13/5/08

So new decomposition? client. No interview. Date* interview. Time staff. No* CR 76 13/5/08 10. 30 SG 5 CR 56 13/5/08 12. 00 SG 5 CR 74 13/5/08 12. 00 SG 37 CR 56 1/7/08 10. 30 SG 5 interview. Date staff. No room. No 13/5/08 SG 5 G 101 13/5/08 SG 37 G 102 1/7/08 SG 5 G 102 So duplication in the room number is now eradicated

4 NF l l Comes from 2 multivalued attributes in a relation E. g.

4 NF l l Comes from 2 multivalued attributes in a relation E. g. for each value of A there is a set of values for B and a set for C, while B and C remain independent of each other Branch. No staff. Name[1. . *] owner. Name[ 1. . *] So if you model your databases from ERM’s this type of dependency should not arise.

Example of 4 NF branch. No staff. Name owner. Name C 003 Anne Carol

Example of 4 NF branch. No staff. Name owner. Name C 003 Anne Carol C 003 David Carol C 003 Anne Tina C 003 David Tina branch. No* staff. Name C 003 Anne C 003 David branch. No* owner. Name C 003 Carol C 003 Tina Note: if step 9 applied to multi-valued attributes then we should map this correctly and avoid such redundancy as the two tables on the right would be the result of the mapping! Adapted from Connolly and Begg, 2005, 4 th ed. Page 428

Normal Form Summary l l A Relation’s degree of normalisation Stronger in format at

Normal Form Summary l l A Relation’s degree of normalisation Stronger in format at each stage – l First Normal Form (1 NF) – – l – The relation has no transitive dependencies Boyce-Codd – l The relation has no partial dependencies All non-key attributes are fully functionally dependent on the PK 3 rd Normal Form (3 NF) – l The relation has no non-atomic values Or the relation has “no repeating group” 2 nd Normal Form (2 NF) – l less vulnerable to update anomalies Every determinant is a candidate key 4 NF – no multi-valued dependencies