Functional Dependencies and Normalization Instructor Mohamed Eltabakh meltabakhcs
Functional Dependencies and Normalization Instructor: Mohamed Eltabakh meltabakh@cs. wpi. edu 1
What to Cover l Functional Dependencies (FDs) l Closure of Functional Dependencies l Lossy & Lossless Decomposition l Normalization 2
Decomposing Relations s. Number Student. Prof s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FDs: p. Number p. Name Student s. Number s 1 s 2 s. Name Dave Greg Lossless p. Number p 1 p 2 Student s. Number S 1 S 2 s. Name Dave Greg p. Number p 1 p 2 Lossy p. Name MM MM Professor p. Number p 1 p 2 p. Name MM MM 3
Lossless vs. Lossy Decomposition l Assume R is divided into R 1 and R 2 l Lossless Decomposition l l R 1 natural join R 2 should create exactly R Lossy Decomposition l R 1 natural join R 2 adds more records (or deletes records) from R 4
Lossless Decomposition s. Number Student. Prof s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FDs: p. Number p. Name Student s. Number s 1 s 2 s. Name Dave Greg Lossless p. Number p 1 p 2 Professor p. Number p 1 p 2 p. Name MM MM Student & Professor are lossless decomposition of Student. Prof (Student ⋈ Professor = Student. Prof) 5
Lossy Decomposition s. Number Student. Prof s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FDs: p. Number p. Name Student s. Number S 1 S 2 s. Name Dave Greg Lossy p. Name MM MM Professor p. Number p 1 p 2 p. Name MM MM Student & Professor are lossy decomposition of Student. Prof (Student ⋈ Professor != Student. Prof) 6
Goal: Ensure Lossless Decomposition l How to ensure lossless decomposition? l Answer: l The common columns must be candidate key in one of the two relations 7
Back to our example s. Number Student. Prof s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM p. Number is candidate key FDs: p. Number p. Name Student s. Number s 1 s 2 s. Name Dave Greg Lossless p. Number p 1 p 2 Student s. Number S 1 S 2 s. Name Dave Greg p. Number p 1 p 2 Lossy p. Name MM MM Professor p. Number p 1 p 2 p. Name is not candidate key p. Name MM MM 8
What to Cover l Functional Dependencies (FDs) l Closure of Functional Dependencies l Lossy & Lossless Decomposition l Normalization 9
Normalization 10
Normalization l First Normal Form (1 NF) l Boyce-Codd Normal Form (BCNF) l Third Normal Form (3 NF) l Canonical Cover of FDs 11
Normalization l Set of rules to avoid “bad” schema design l Decide whether a particular relation R is in “good” form l l Several levels of normalization l l l If not, decompose R to be in a “good” form First Normal Form (1 NF) BCNF Third Normal Form (3 NF) Fourth Normal Form (4 NF) If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized 12
First Normal Form (1 NF) l Attribute domain is atomic if its elements are considered to be indivisible units (primitive attributes) l Examples of non-atomic domains are multi-valued and composite attributes l A relational schema R is in first normal form (1 NF) if the domains of all attributes of R are atomic We assume all relations are in 1 NF 13
First Normal Form (1 NF): Example Since all attributes are primitive It is in 1 NF 14
Boyce-Codd Normal Form (BCNF): Definition A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form α→β where α ⊆ R and β ⊆ R, then at least one of the following holds: § § α → β is trivial (i. e. , β⊆α) α is a superkey for R Remember: Candidate keys are also superkeys 15
BCNF: Example Student s. Number s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 ER s 3 Mike p 1 MM Student Info Professor Info Is relation Student in BCNF given p. Number p. Name l l It is not trivial FD p. Number is not a key in Student relation NO How to fix it and make it in BCNF? ? ? 16
Decomposing a Schema into BCNF l If R is not in BCNF because of non-trivial dependency α → β, then decompose R l R is decomposed into two relations l l R 1 = (α U β ) -- α is super key in R 1 R 2 = (R- (β - α)) -- R 2. α is foreign keys to R 1. α 17
Example of BCNF Decomposition Student. Prof s. Number s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FDs: p. Number p. Name Student Professor s. Number s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FOREIGN KEY: Student (PNum) references Professor (PNum) 18
What is Nice about this Decomposing ? ? ? l R is decomposed into two relations l l R 1 = (α U β ) -- α is super key in R 1 R 2 = (R- (β - α)) -- R 2. α is foreign keys to R 1. α This decomposition is lossless (Because R 1 and R 2 can be joined based on α, and α is unique in R 1) l When you join R 1 and R 2 on α, you get R back without lose of information 19
Student. Prof = Student ⋈ Professor Student. Prof s. Number s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM FDs: p. Number p. Name Student Professor s. Number s. Name p. Number p. Name s 1 Dave p 1 MM s 2 Greg p 2 MM BCNF decomposition rule create lossless decomposition 20
Multi-Step Decomposition l Relation R and functional dependency F l l R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) F = {branch_name assets branch_city, loan_number amount branch_name} l Is R in BCNF ? ? NO l Based on branch_name assets branch_city l l R 1 = (branch_name, assets, branch_city) R 2 = (customer_name, loan_number, branch_name, amount) l Are R 1 and R 2 in BCNF ? l Divide R 2 based on loan_number amount branch_name l l R 2 is not R 3 = (loan_number, amount, branch_name) R 4 = (customer_name, loan_number) Final Schema has R 1, R 3, R 4 21
What is NOT Nice about BCNF Before decomposition, we had set of functional dependencies FDs (Say F) After decomposition, do we still have the same set of FDs or we lost something ? ? 22
What is NOT Nice about BCNF l Dependency Preservation l After the decomposition, all FDs in F+ should be preserved l BCNF does not guarantee dependency preservation l Can we always find a decomposition that is both BCNF and preserving dependencies? l l No…This decomposition may not exist That is why we study a weaker normal form called (third normal form – 3 NF) 23
Dependency Preserving Assume R is decomposed to R 1 and R 2 Dependencies of R 1 and R 2 include: l Local dependencies α → β l l All columns of α and β must be in a single relation Global Dependencies l Use transitivity property to form more FDs across R 1 and R 2 relations Yes Dependency preserving Does these dependencies match the ones in R ? Not dependency preserving 24
Example of Lost FD l Assume relation R(C, S, J, D, T, Q, V) l C is key, JT C and SD T l l (C is key) -- Good for BCNF (JT is key) -- Good for BCNF (SD is not a key) –Bad for BCNF Decomposition: l l C CSJDTQV JT CSJDTQV SD T R 1(C, S, J, D, Q, V) and R 2(S, D, T) Lossless & in BCNF Does C CSJDTQV still exist? l Yes: C CSJDQV (local), SD T (local), C CSJDQVT (global) 25
Example of Lost FD (Cont’d) l Assume relation R(C, S, J, D, T, Q, V) l C is key, JT C and SD T l l (C is key) -- Good for BCNF (JT is key) -- Good for BCNF (SD is not a key) –Bad for BCNF Decomposition: l l C CSJDTQV JT CSJDTQV SD T R 1(C, S, J, D, Q, V) and R 2(S, D, T) Lossless & in BCNF Does SD T still exist? l Yes: SD T (local) 26
Example of Lost FD (Cont’d) l Assume relation R(C, S, J, D, T, Q, V) l C is key, JT C and SD T l l (C is key) -- Good for BCNF (JT is key) -- Good for BCNF (SD is not a key) –Bad for BCNF Decomposition: l l C CSJDTQV JT CSJDTQV SD T R 1(C, S, J, D, Q, V) and R 2(S, D, T) Lossless & in BCNF Does JT CSJDTQV still exist? l No this one is lost (no way from the local FDs to get this one) 27
Dependency Preservation Test l Assume R is decomposed into R 1 and R 2 local dependencies in R 1 l The closure of FDs in R is F+ l The FDs in R 1 and R 2 are FR 1 and FR 2, respectively l Then dependencies are preserved if: l F+ = (FR 1 union FR 2)+ local dependencies in R 2 28
Back to Our Example l Assume relation R(C, S, J, D, T, Q, V) l C is key, JT C and SD T l l l l (C is key) -- Good for BCNF (JT is key) -- Good for BCNF (SD is not a key) –Bad for BCNF Decomposition: l l C CSJDTQV JT CSJDTQV SD T R 1(C, S, J, D, Q, V) and R 2(S, D, T) F+ = {C CSJDTQV, JT CSJDTQV, SD T} FR 1 = {C CSJDQV} local for R 1 JT C is still missing FR 2 = {SD T} local for R 2 FR 1 U FR 2 = {C CSJDQV, SD T} (FR 1 U FR 2)+ = {C CSJDQV, SD T, C T} 29
Dependency Preservation BCNF does not necessarily preserve FDs. But 3 NF is guaranteed to be able to preserve FDs. 30
Normalization l First Normal Form (1 NF) l Boyce-Codd Normal Form (BCNF) l Third Normal Form (3 NF) l Canonical Cover of FDs 31
Third Normal Form: Motivation l There are some situations where l l BCNF is not dependency preserving Solution: Define a weaker normal form, called Third Normal Form (3 NF) l l Allows some redundancy (we will see examples later) But all FDs are preserved There is always a lossless, dependencypreserving decomposition in 3 NF 32
Normal Form : 3 NF Relation R is in 3 NF if, for every FD in F+ α β, where α ⊆ R and β ⊆ R, at least one of the following holds: § α → β is trivial (i. e. , β⊆α) § α is a superkey for R § Each attribute in β-α is part of a candidate key (prime attribute) L. H. S is superkey OR R. H. S consists of prime attributes 33
Testing for 3 NF l Use attribute closure to check for each dependency α → β, if α is a superkey l If α is not a superkey, we have to verify if each attribute in (β - α) is contained in a candidate key of R 34
3 NF: Example Lot (ID, county, lot. Num, area, price, tax. Rate) Primary key: ID Candidate key: <county, lot. Num> FDs: l county tax. Rate area price Is relation Lot in 3 NF ? NO Decomposition based on county tax. Rate Lot (ID, county, lot. Num, area, price) County (county, tax. Rate) l Are relations Lot and County in 3 NF ? Lot is not 35
3 NF: Example (Cont’d) Lot (ID, county, lot. Num, area, price) County (county, tax. Rate) Candidate key for Lot: <county, lot. Num> FDs: county tax. Rate area price Decompose Lot based on area price Lot (ID, county, lot. Num, area) County (county, tax. Rate) Area (area, price) l Is every relation in 3 NF ? YES 36
Comparison between 3 NF & BCNF ? l If R is in BCNF, obviously R is in 3 NF l If R is in 3 NF, R may not be in BCNF l 3 NF allows some redundancy and is weaker than BCNF l 3 NF is a compromise to use when BCNF with good constraint enforcement is not achievable l Important: Lossless, dependency-preserving decomposition of R into a collection of 3 NF relations always possible ! 37
Normalization l First Normal Form (1 NF) l Boyce-Codd Normal Form (BCNF) l Third Normal Form (3 NF) l Canonical Cover of FDs 38
Canonical Cover of FDs 39
Canonical Cover of FDs l Canonical Cover (Minimal Cover) = G l l l Is the smallest set of FDs that produce the same F+ There are no extra attributes in the L. H. S or R. H. S of and dependency in G Given set of FDs (F) with functional closure F+ l Canonical cover of F is the minimal subset of FDs (G), where G+ = F + Every FD in the canonical cover is needed, otherwise some dependencies are lost 40
Example : Canonical Cover l Given F: l l A B, ABCD E, EF GH, ACDF EG Then the canonical cover G: l A B, ACD E, EF GH The smallest set (minimal) of FDs that can generate F+ 41
Computing the Canonical Cover l Given a set of functional dependencies F, how to compute the canonical cover G 42
Example : Canonical Cover (Lets Check L. H. S) l l l Given F = {A B, ABCD E, EF G, EF H, ACDF EG} Union Step: {A B, ABCD E, EF GH, ACDF EG} Test ABCD E l Check A: l l {BCD}+ = {BCD} A cannot be deleted Check B: l {ACD}+ = {A B C D E} Then B can be deleted l Now the set is: {A B, ACD E, EF GH, ACDF EG} l Test ACD E l Check C: l l {AD}+ = {ABD} C cannot be deleted Check D: l {AC}+ = {ABC} D cannot be deleted 43
Example: Canonical Cover (Lets Check L. H. S-Cont’d) l Now the set is: {A B, ACD E, EF GH, ACDF EG} l Test EF GH l Check E: l l E cannot be deleted Check F: l l {F}+ = {F} {E}+ = {E} F cannot be deleted Test ACDF EG l None of the H. L. S can be deleted 44
Example: Canonical Cover (Lets Check R. H. S) l Now the set is: {A B, ACD E, EF GH, ACDF EG} l Test EF GH l Check G: l l Check H: l l {EF}+ = {E F G} H cannot be deleted Test ACDF EG l Check E: l l {EF}+ = {E F H} G cannot be deleted {ACDF}+ = {A B C D F E G} E can be deleted Now the set is: {A B, ACD E, EF GH, ACDF G} 45
Example: Canonical Cover (Lets Check R. H. S-Cont’d) l Now the set is: {A B, ACD E, EF GH, ACDF G} l Test ACDF G l Check G: l {ACDF}+ = {A B C D F E G} G can be deleted Now the set is: {A B, ACD E, EF GH} The canonical cover is: {A B, ACD E, EF GH} 46
Canonical Cover l Used to find the smallest (minimal) set of FDs that have the same closure as the original set. l Used in the decomposition of relations to be in 3 NF l The resulting decomposition is lossless and dependency preserving 47
Done with Normalization l First Normal Form (1 NF) l Boyce-Codd Normal Form (BCNF) l Third Normal Form (3 NF) l Canonical Cover of FDs 48
Questions ? 49
What You Learned l Data Models l l Entity-Relationship Model & ERD Relational Model l Conversion between the data models l Relational Algebra & Operators l Structured Query Language SQL l l DML: Data Manipulation Language DDL: Data Definition Language 50
What You Learned (Cont’d) l Advanced SQL l l Triggers, Views, Cursors, Stored Procedures and Functions PL/SQL l Functional Dependencies l Normalization Rules 51
In Advanced Courses Things get more interesting l Indexing Techniques l Transaction Management l Query Optimization l Handling of Big Data l And many more … 52
- Slides: 52