3 Chapter 3 6 7 Normalization of Database
3 Chapter 3. 6 -7 Normalization of Database Tables Spring 2007 1
3 Normalization • Normalization is process for assigning attributes to entities u. Reduces data redundancies u. Helps eliminate data anomalies u. Produces controlled redundancies to link tables • Normalization stages u 1 NF - First normal form u 2 NF - Second normal form u 3 NF - Third normal form u 4 NF - Fourth normal form Spring 2007 2
3 Example: title year Name length Stars address Starts-in Movies Owns Name Studios address Spring 2007 3
Movies Problem Example 3 • Update anomalies: If Harrison Ford’s phone # changes, must change it in each of his tuples. If Length value of Star Wars needs to be changed, must change all occurrences • Deletion anomalies: If we delete Wayne’s World entries from database, we also loose all info about Dana Carvey & Mike Meyers Spring 2007 4
3 Conversion to 1 NF • Repeating groups must be eliminated u. Proper primary key developed • Uniquely identifies each tuple u. Dependencies can be identified • undesirable dependencies allowed – Partial » based on part of composite primary key – Transitive » one nonprime attribute depends on another nonprime attribute Spring 2007 5
Conversion to 1 NF Cont. 3 • An attribute that is at least part of a key is known as a prime attribute or key attribute or primary key. Spring 2007 6
Example • • 3 Projects assigned to employees Each project has a number and a name Each employee has a number, a name, a job class Each employee working on a project, need to keep number of hours spent on project, and hourly rate. • Project Assignments Table : ( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) • What’s the Key for this relation? Spring 2007 7
3 Data Organization: 1 NF Spring 2007 8
3 Dependency Diagram (1 NF) PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS PROJ_NUM --> PROJ_NAME EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS JOB_CLASS --> CHG_HOUR Spring 2007 9
3 1 NF Summarized • • Spring 2007 All key attributes defined Primary Key identified No repeating groups in table All attributes dependent on primary key 10
3 2 NF Summarized • In 1 NF, but • Includes no partial dependencies • Partial dependency: u. An attribute is functionally dependent on a portion of the primary key. • Example: u PROJ_NUM PROJ_NAME u. EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS Spring 2007 11
Conversion to 2 NF 1. 2. 3. 4. 5. 6. 3 Start with 1 NF format: Write each key component on a separate line Write dependent attributes after each key component Write original key on last line Write any remaining attributes after original key Each component is new table PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) Spring 2007 12
3 2 NF Conversion Results Spring 2007 13
3 2 NF Summarized • In 1 NF • Includes no partial dependencies • Still possible to exhibit transitive dependency u. Attributes may be functionally dependent on non-key attributes Spring 2007 14
3 Conversion to 3 NF • decompose table(s) to eliminate transitive functional dependencies PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) Spring 2007 15
3 3 NF Summarized • In 2 NF • Contains no transitive dependencies Spring 2007 16
Boyce-Codd Normal Form (BCNF) 3 • Formally, R is in BCNF if for every nontrivial FD for R, say X A, X is a superkey. u“Nontrivial” = right-side attribute not in left side. u. Trivial FDs examples • A A • AB A • Informally: the only arrows in the FD diagram are arrows out of superkeys u. Note: BCNF violation when relation has more than one superkey that overlap Spring 2007 18
3 What normal form? 3 NF Table Not in BCNF Spring 2007 19
3 Decomposition of Table Structure to Meet BCNF Spring 2007 20
Decomposition to Reach BCNF 3 Setting: relation R with FDs F. Suppose relation R has BCNF violation X B and X not a superkey. Spring 2007 21
X +. 1. Compute u. Cannot be all attributes – why? 2. Decompose R into X+ and (R–X+) X. R 3 X X+ 3. Find the FD’s for the decomposed relations. u. Project the FD’s from F = calculate all consequents of F that involve only attributes from X+ or only from (R X+) X. Spring 2007 22
Decomposition to Reach BCNF 3 • Identify the violating FD • E. g. : X 1 X 2…Xn B 1 B 2…Bm • Add to the right-hand side of FD as many attributes as are functionally determined by (X 1 X 2…Xn) • Decompose relation R into two relations: u. One relation has all attributes Xs & Bs u. Second relation has the Xs plus any other remaining attributes from R other than Bs Spring 2007 23
BCNF--Example 3 Assume R(S, J, T) • S: Student • J: subject • T: Teacher • Student S is taught subject J by teacher T. • Constraints: u. For each subject, each student of that subject is taught by only one teacher u. Each teacher teaches only one subject (but each subject is taught by several teachers) Spring 2007 24
BCNF--Example S Smith J Math Physics T Prof. White Prof. Green Jane Math Prof. White Jane Physics Prof. Brown 3 Functional Dependencies: • S , J T • T J Spring 2007 25
BCNF--Example 3 • Candidate keys: {S, J} and {S, T} S T J • 3 NF but not in BCNF • Update anomaly: if we delete the info that Jane is studying Physics we also loose the info that Prof. Brown teaches Physics • Solution: two relations R 1{S, T} R 2{T, J} Spring 2007 26
3 Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. Relations R 1[A, B] FD: B C R 2[B, C] Tuples in R: (a, b, c) Tuples in R 1: (a, b) Tuples in R 2: (b, c) Natural join of R 1 and R 2 = (a, b, c) original relation R can be reconstructed by forming the natural join of R 1 and R 2. Spring 2007 27
3 Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. Relations R 1[A, B] FD: B C R 2[B, C] Tuples in R: (a, b, c) , (d, b, e) Tuples in R 1: (a, b), (d, b) Tuples in R 2: (b, c), (b, e) Tuples in the natural join of R 1 and R 2: (a, b, c), (a, b, e) (d, b, c), (d, b, e) Can (a, b, e), (d, b, c) be a bogus tuples? Spring 2007 28
3 Decomposition Based on BCNF is Necessarily Correct • Answer: No • Because: B C i. e. if 2 tuples have same B attribute then they must have the same C attribute. (b, c) = (b, e) (a, b, e) = (a, b, c) and (d, b, c) = (d, b, e) Spring 2007 29
Theorem 3 • Any two-attribute relation is in BCNF. Spring 2007 30
3 Decomposition Theorem • Suppose we decompose a relation R(X, Y, Z) into R 1(X, Y) and R 2(X, Z) and project the R onto R 1 and R 2. • Then join(R 1, R 2) is guaranteed to reconstruct R if and only if X Y or X Z • Notice that whenever we decompose because of a BNCF violation, one of the above FDs holds. Spring 2007 31
3 NF One FD structure causes problems in BCNF: 3 • If you decompose, you can’t recover all of the original FD’s. • If you don’t decompose, you violate BCNF. Abstractly: AB C and C B. • Example : street city zip, and zip city. BCNF violation: C B has a left side that is not a superkey. • Based on previous algorithm, decompose into BC and AC. u But the FD AB C does not hold in new tables. Spring 2007 32
3 Example A = street, B = city, C = zip city BCNF violation Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city zip Spring 2007 33
3 Example zip city Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city zip Spring 2007 34
“Elegant” Workaround 3 Define the problem away. • A relation R is in 3 NF iff (if and only if) for every nontrivial FD X A, either: 1. X is a superkey, or 2. A is prime = member of at least one key. • Thus, if we just normalize to the 3 NF, the problem goes away. Spring 2007 35
What 3 NF and BCNF Give You • 3 There are two important properties of a decomposition: 1. Recovery : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. 2. Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied. Spring 2007 36
3 NF and BCNF, Continued 3 • We can get (1) with a BCNF decomposition. • We can get both (1) and (2) with a 3 NF decomposition. • But we can’t always get (1) and (2) with a BCNF decomposition. ustreet-city-zip is an example. Spring 2007 37
3 Mutli-valued Dependencies Fourth Normal Form Spring 2007 38
Definition of MVD 3 • A multivalued dependency is an assertion that two attributes (sets of attributes) are independent of one another. • Formally: A multivalued dependency (MVD) on R, X ->->Y , says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation. Spring 2007 39
Example 3 Actors(name, addr, phones, cars) with MVD Name phones. name addr phones cars sue a p 1 b 1 sue a p 2 b 2 it must also have the same tuples with phones components swapped: name addr phones cars sue a p 2 b 1 sue a p 1 b 2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair. Spring 2007 40
Example 2 3 • An actor may have more than one address • Key? What normal form? • Note the redundancies • MVD: name street city • read: name determines 1 or more street & city independent of all other attributes Spring 2007 41
MVD Rules 3 1. Every FD is an MVD. u Because if X Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples. u Example, in Actors: name addr. • Note: the opposite is not true i. e. not every MVD is a FD 2. Complementation: if X Y, then X Z, where Z is all attributes not in X or Y. u Example: since name phones holds in Actors, the name addr cars. Spring 2007 42
Splitting Doesn’t Hold • name street city 3 holds, but • name street does not hold u. Name does not determine 1 or more street independent of city. • name city Spring 2007 does not hold 43
Example 2 3 • An actor may have more than one address • MVD: name street city • read: name determines 1 or more street & city independent of all other attributes • Also (complement MVD): Spring 2007 name title year 44
3 Fourth Normal Form • The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. • There is a stronger normal form, called 4 NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation. Spring 2007 45
3 4 NF Eliminate redundancy due to multiplicative effect of MVD’s. • Roughly: treat MVD’s as FD's for decomposition, but not for finding keys. • Formally: R is in Fourth Normal Form if whenever MVD X Y is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. u Remember, X Y implies X Y, so 4 NF is more stringent than BCNF. • Decompose R, using 4 NF violation X Y, into XY and X (R—Y). Spring 2007 R X Y 46
Example 3 Drinkers(name, addr, phones, cars) • FD: name addr name phones name cars. • Only key: {name, phones, cars} • All three dependencies above violate 4 NF. Why? • Successive decomposition yields 4 NF relations: D 1(name, addr) D 2(name, phones) D 3(name, cars) • Nontrivial MVD’s: Spring 2007 47
Example 2 3 Ø name street city ØDecompose into: R 1(name, street, city) R 2(name, title, year) Spring 2007 48
- Slides: 47