CSE 480 Database Systems Lecture 18 Normal Forms
CSE 480: Database Systems Lecture 18: Normal Forms and Normalization 1
Functional Dependencies A functional dependency (FD) takes the form of X Y, where X and Y are subsets of attributes in a relation What does X Y mean? Values of attributes X determines the values of attributes Y; Values of attributes Y depends on the values of attributes X; Suppose t 1 and t 2 are two tuples in the relation. If t 1 and t 2 have the same values for attribute set X, then their values for attribute set Y must be identical to each other in these two tuples 2
Functional Dependencies EMP_PRJ(Ssn, Pnumber, Hours, Ename, Plocation) {Ssn} {Ename} is a FD Ename depends on Ssn {Pnumber} {Pname, Plocation} is a FD Pname and Plocation depends on Pnumber Two rows with the same Pnumber must have the same values of Pname and Plocation {Plocation} {Pnumber} is not a FD {Ename, Plocation} {Pnumber} is not a FD 3
Functional Dependencies l Graphical Representation of FDs: l FD 1: {SSN, Pnumber} {Hours} l FD 2: {SSN} {Ename} l FD 3: {PNumber} {PName, PLocation} 4
Functional Dependencies l A relation may contain many functional dependencies – How to derive all of them? l Given a set of functional dependencies of a relation R: = {AC B, A C, D A} – Does entail AD BC (i. e. , is AD BC also a FD of R)? 5
Inference Rules (Example) Given = {AC B, A C, D A } Does entail AD BC? 1. D A (given in ) 2. AD A (augmenting (1) with A) 3. A C (given in ) 4. A AC (augmenting (3) with A) 5. AC B (given in ) 6. AC BC (augmenting (5) with C) 7. A BC (transitive between (4) and (6)) 8. AD BC (transitive between (2) and (7)) 6
Normal Forms and Normalization l Functional dependencies can help us analyze whether a relational schema is “good” or “bad” l In relational model, we don’t say that a schema is good/bad. We say it is in 1 NF, 2 NF, 3 NF, etc – Properties u The higher the NF, the stricter the conditions placed on the schema u A higher NF relation is also in lower NF but not vice-versa – A 3 NF relation is in 2 NF and 1 NF (but not in 4 NF, 5 NF) l Normalization: – The process of decomposing "bad" (lower normal form) relations by breaking up their attributes into smaller relations 7
First Normal Form l A schema is in 1 NF if it permits only atomic (indivisible) attribute values l 1 NF disallows – composite attributes – multivalued attributes l The relational model itself prohibits relations that contain composite and multivalued attributes – Therefore, all the schemas in relational model are at least in 1 NF 8
Example Relation is not in 1 NF because it has a multivalued attribute (Dlocations) 9
Normalization into 1 NF l 3 strategies for normalization: – Place the “offending” attributes in a separate relation u DEPARTMENT(Dname, Dnumber, Dmgr_ssn) u DEPTLOCATIONS(Dnumber, Dlocation) – Change Dlocations into Dlocation and modify the primary key u DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dlocation) – If the maximum number of locations per department is 3: u DEPARTMENT(Dname, Dnumber, Dmgr_ssn, Dloc 1, Dloc 2, Dloc 3) 10
Is 1 NF Sufficient? l l Key of the relation is the combination of (Dnumber, Dlocation) Relation is in 1 NF, but there are redundancies: – Two rows with the same Dnumber must have the same Dname and Dmgr_ssn (even though their Dlocations are different) 11
2 NF (Motivating Example) l Functional dependencies – {Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key) – {Dnumber} {Dname, Dmgr_ssn} l Consequence: two tuples with same Dnumber but different Dlocation will have same Dname and Dmgr_ssn, which leads to redundancy! l If {Dnumber} {Dname, Dmgr_ssn} is not a FD, then there won’t be a redundancy problem 12
2 NF (Motivating Example) {Dnumber, Dlocation} {Dname, Dmgr_ssn} (from primary key) {Dnumber} {Dname, Dmgr_ssn} l This example suggests that if X Y is a FD, where X is the key, you can’t have X’ Y also a FD of the same table (where X’ is a subset of X), otherwise, there’ll be redundancies in the table – We say that X Y must be a full FD 13
Full versus Partial Dependencies l X Y is a full FD if removal of any attribute from X means the FD does not hold any more l X Y is a partial FD if there is a FD X’ Y where X’ is a subset of X l Example: – {Dnumber, Dlocation} {Dname, Dmgr_ssn} is a partial FD because {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema 14
Prime versus Non. Prime Attributes l Prime attribute: – an attribute that is a member of the candidate key K – Example (from previous slide): Dnumber, Dlocation l Nonprime attribute: – an attribute that is not a member of any candidate key. – Example (from previous slide): Dname, Dmgr_ssn 15
2 NF Definition l l A relation schema R is in second normal form (2 NF) if every nonprime attribute A in R is fully functionally dependent on the key of R Since {Dnumber, Dlocation} is the key – {Dnumber, Dlocation} {Dname, Dmgr_ssn} is FD of the schema – But {Dnumber} {Dname, Dmgr_ssn} is also a FD of the schema u The non-prime attributes are not fully functionally dependent on the key u So schema is not in 2 NF 16
Example l FDs: – {SSN, Pnumber} {Hours, Ename, Plocation}, – {SSN} {Ename}, – {Pnumber} {Pname, Plocation} 17
Example – {SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS nor PNUMBER HOURS hold – But {SSN, PNUMBER} ENAME is a partial dependency since SSN ENAME also holds 18
2 NF – – l l Is {SSN, PNUMBER} {Hours} a full FD? Yes {Ename} a full FD? No {Plocation} a full FD? No Conclusion: The EMP_PROJ relation is not in 2 NF normalization: take the “offending” FDs and create separate relations 19
Normalizing into 2 NF {SSN, Pnumber} {Hours}, {SSN} {Ename}, {Pnumber} {Pname, Plocation} 20
Is 2 NF sufficient? l l Key is SSN FDs: – {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn} – {Dnumber} {Dname, Dmgr_ssn} l Is the table in 2 NF? – Yes because every non-prime attribute is fully FD on the key 21
Is 2 NF sufficient? l Are there still redundancies in the relation? Yes – Two tuples with the same Dnumber have the same Dname and Dmgr_ssn l What is the “offending” FD that causes redundancy? 22
Is 2 NF sufficient? l Functional dependencies: – {SSN} {Ename, Bdate, Address, Dnumber, Dname, Dmgr_ssn} – {Dnumber} {Dname, Dmgr_ssn} l Since Dnumber is not a key, you can have two rows with the same Dnumber. Hence their Dname and Dmgr_ssn must be the same => redundancy! 23
3 NF l A relation schema R is in third normal form (3 NF) if – It is in 2 NF and – There is no non-prime attribute in R that is transitively dependent on the primary key u In X Y and Y Z are FDs, with X as the primary key, we consider Z to be transitively dependent on X only if Y is not a candidate key. If Y is a candidate key, then we do not consider this as a transitive dependency problem 24
Example of 3 NF l FDs: – SSN Ename, Bdate, Address, Dnumber – SSN Dnumber – Dnumber Dname, Dmgr_ssn l Dname is transitively dependent on the primary key SSN because SSN Dnumber and Dnumber Dname are FDs of the relation – Therefore the relation is not in 3 NF 25
Third Normal Form l Another way to check whether a relation is in 3 NF (without checking for partial and transitive dependencies): – A relation schema R is in 3 NF if whenever a nontrivial FD X A holds, either u X is a superkey of R or u A is a prime attribute of R 26
3 NF l FDs: – SSN Ename, Bdate, Address – SSN Dnumber – Dnumber Dname, Dmgr_ssn u l Transitive dependency But Dnumber is not superkey and Dname, Dmgr_ssn are not prime attributes Therefore the relation is not in 3 NF 27
Normalizing into 3 NF Take the “offending” FDs and create separate relations 28
Is 3 NF enough to remove redundancy? Assume every instructor teaches only 1 course Key is (Student, Course) l FDs: – {Student, Course} Instructor – Instructor Course l No transitive dependency because Course is not a prime attribute Relation is in 3 NF (but there is still redundancy) 29
BCNF (Boyce-Codd Normal Form) l A relation schema R is in BCNF if whenever an FD X A holds in R, then X must be a superkey of R l FDs: – {Student, Course} Instructor – Instructor Course l Relation is not in BCNF because Instructor is not a superkey 30
Achieving BCNF by Decomposition STUD_COURSE l COURSE_INSTRUCT STUD_COURSE – Key is {Student, Course} l COURSE_INSTRUCT – Key is {Instructor} – FD: Instructor Course l Loses the FD: {Student, Course} Instructor – But no redundancy 31
Decomposition 1 l Problem: decomposition does not result in lossless join (i. e. , does not have nonadditive join property) – i. e. , spurious tuples may be generated 32
Decomposition 2 l Dependency preserving? No – loses the FD: {Student, Course} Instructor l Lossless join? Yes 33
Decomposition 3 l Dependency preserving? No – loses the FD: {Student, Course} Instructor l Lossless join? No 34
Summary l 1 st normal form – no composite/multivalued attributes in relations l 2 nd, 3 rd, and Boyce-Code normal forms – Eliminate redundancies based on FDs l More normal forms (see textbook) – 4 th : deal with multivalued dependencies – 5 th : deal with join dependencies 35
- Slides: 35