Normal Forms Zachary G Ives University of Pennsylvania

  • Slides: 25
Download presentation
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information

Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Announcements § Homework 3 will be due Monday 10/22 § Fall break will be

Announcements § Homework 3 will be due Monday 10/22 § Fall break will be 10/19, midterm on 10/26 2

Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s

Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms: § Reflexivity: If Y X then X Y (trivial dependencies ) name, sid name § Augmentation: If X Y then XW YW serno subj so serno, exp-grade subj, exp-grade § Transitivity: If X Y and Y Z then X Z serno cid and cid subj so serno subj 3

Armstrong’s Axioms Lead to… If X Y and X Z then X YZ §

Armstrong’s Axioms Lead to… If X Y and X Z then X YZ § Pseudotransitivity: If X Y and WY Z then XW Z § Decomposition: If X Y and Z Y then X Z § Union: Let’s prove a few of these from Armstrong’s Axioms 4

Closure of a Set of FD’s Defn. Let F be a set of FD’s.

Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure , F +, is the set of all FD’s: {X Y | X Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student. Course FD’s? Student. Data(sid, name, serno, cid, subj, grade) name cid subj serno subj cid, sid subj cid sid 5

Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set

Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X+, is: X+ = {Y | X Y F +} § This answers the question “is Y determined (transitively) by X? ”; compute X+ by: closure : = X; repeat until no change { if there is an FD U V in F such that U is in closure then add V to closure} § Does sid, serno subj, exp-grade? 6

Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent

Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e. g. , these two sets are equivalent: {XY Z, X Y} and {X Z, X Y} § F + contains a huge number of FD’s (exponential in the size of the schema) § Would like to have smallest “representative” FD set 7

Minimal Cover we express Defn. A FD set F is minimal if: each FD

Minimal Cover we express Defn. A FD set F is minimal if: each FD in 1. Every FD in F is of the form X A, simplest form where A is a single attribute 2. For no X A in F is: in a sense, F – {X A } equivalent to F each FD is “essential” 3. For no X A in F and Z X is: to the cover F – {X A } {Z A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e. g. , {X Z, X Y} is a minimal cover for {XY Z, X Y} 8

More on Closures If F is a set of FD’s and X Y F

More on Closures If F is a set of FD’s and X Y F + then for some attribute A Y, X A F + Proof by counterexample. Assume otherwise and let Y = {A 1, . . . , An} Since we assume X A 1, . . . , X An are in F + then X A 1. . . An is in F + by union rule, hence, X Y is in F + which is a contradiction 9

Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for

Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are: § Consistent : any relation satisfying FD’s in F will satisfy those in F + § Complete : if an FD X Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X Y Ø In other words, Armstrong’s axioms derive all the FD’s that should hold Ø What is the goal of using these axioms? 10

Decomposition Consider our original “bad” attribute set Stuff(sid, name, serno, subj, cid, exp-grade) We

Decomposition Consider our original “bad” attribute set Stuff(sid, name, serno, subj, cid, exp-grade) We could decompose it into: Student(sid, name) Course(serno, cid) Subject(cid, subj) But this decomposition loses information about the relationship between students and courses. Why? 11

Lossless Join Decomposition R 1, … Rk is a lossless join decomposition of R

Lossless Join Decomposition R 1, … Rk is a lossless join decomposition of R w. r. t. an FD set F if for every instance r of R that satisfies F, ÕR 1(r) ⋈. . . ⋈ ÕRk(r) = r Consider: sid name serno subj cid exp-grade 1 23 Sam Nitin 570103 550103 AI DB 570 550 B A What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? 12

Testing for Lossless Join R 1, R 2 is a lossless join decomposition of

Testing for Lossless Join R 1, R 2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R 1 R 2) R 1 – R 2 (R 1 R 2) R 2 – R 1 So for the FD set: sid name serno cid, exp-grade cid subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition? 13

Dependency Preservation Ensures we can “easily” check whether a FD X Y is violated

Dependency Preservation Ensures we can “easily” check whether a FD X Y is violated during an update to a database: § The projection of an FD set F onto a set of attributes Z, F Z is {X Y | X Y F +, X Y Z} i. e. , it is those FDs local to Z’s attributes § A decomposition R 1, …, Rk is dependency preserving F + = (F R 1 . . . F R k)+ if The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join 14

Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip,

Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD set name street, city st street, city zip name, item price Consider the decomposition R 1(name, street, city, st, zip) and R 2(name, item, price) Ø Is it lossless? Ø Is it dependency preserving? What if we replaced the first FD by name, street city? 15

Another Example Given scheme: R(sid, fid, subj) and FD set: fid subj sid, subj

Another Example Given scheme: R(sid, fid, subj) and FD set: fid subj sid, subj fid Consider the decomposition R 1(sid, fid) and R 2(fid, subj) Ø Is it lossless? Ø Is it dependency preserving? 16

FD’s and Keys § Ideally, we want a design s. t. for each nontrivial

FD’s and Keys § Ideally, we want a design s. t. for each nontrivial dependency X Y, X is a superkey for some relation schema in R § We just saw that this isn’t always possible § Hence we have two kinds of normal forms 17

Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and

Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X A that holds over R, either A X (it is trivial) , or or X is a superkey for R Third Normal Form (3 NF). For every relation scheme R and for every X A that holds over R, either A X (it is trivial), or X is a superkey for R, or A is a member of some key for R 18

Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of

Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation § It’s strictly stronger than 3 NF Let’s see algorithms to obtain: § A BCNF lossless join decomposition (nondeterministic) § A 3 NF lossless join, dependency preserving decomposition 19

BCNF Decomposition Algorithm (from Korth et al. ; our book gives a recursive version)

BCNF Decomposition Algorithm (from Korth et al. ; our book gives a recursive version) result : = {R} compute F+ while there is a relation schema R i in result that isn’t in BCNF { i. e. , A doesn’t form a key let A B be a nontrivial FD on R i s. t. A R i is not in F+ and A and B are disjoint } result: = (result – Ri) {(Ri - B), (A, B)} 20

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs:

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid name fid prof serno classroom, cid, fid § Find the Boyce-Codd Normal Form for this schema § What if instead: sid name fid prof classroom, cid serno cid 21

3 NF Decomposition Algorithm Let F be a minimal cover i: =0 for each

3 NF Decomposition Algorithm Let F be a minimal cover i: =0 for each FD A B in F { if none of the schemas R j, 1 j i, contains AB { increment i R i : = (A, B) } } if no schema R j, 1 j i contains a candidate key for R { increment i R i : = any candidate key for R } return (R 1, …, Ri) Build dep. preserving decomp. Ensure lossless decomp. 22

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs:

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid name fid prof serno classroom, cid, fid § Find the Third Normal Form for this schema § What if instead: sid name fid prof classroom, cid serno cid 23

Summary of Normalization § We can always decompose into 3 NF and get: §

Summary of Normalization § We can always decompose into 3 NF and get: § Lossless join § Dependency preservation § But with BCNF: § We are only guaranteed lossless joins § The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R § BCNF is stronger than 3 NF: every BCNF schema is also in 3 NF 24

Normalization Is Good… Or Is It? § In some cases, we might not mind

Normalization Is Good… Or Is It? § In some cases, we might not mind redundancy, if the data isn’t directly updated: § Reports (people like to see breakdowns by semester, department, course, etc. ) § Warehouses (archived copies of data for doing complex analysis) § Data sharing (sometimes we may export data into objectoriented or hierarchical formats) 25