Normal Forms Zachary G Ives University of Pennsylvania

Announcements § Homework 3 will be due Monday 10/22 § Fall break will be

Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s

Armstrong’s Axioms Lead to… If X Y and X Z then X YZ §

Closure of a Set of FD’s Defn. Let F be a set of FD’s.

Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set

Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent

Minimal Cover we express Defn. A FD set F is minimal if: each FD

More on Closures If F is a set of FD’s and X Y F

Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for

Decomposition Consider our original “bad” attribute set Stuff(sid, name, serno, subj, cid, exp-grade) We

Lossless Join Decomposition R 1, … Rk is a lossless join decomposition of R

Testing for Lossless Join R 1, R 2 is a lossless join decomposition of

Dependency Preservation Ensures we can “easily” check whether a FD X Y is violated

Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip,

Another Example Given scheme: R(sid, fid, subj) and FD set: fid subj sid, subj

FD’s and Keys § Ideally, we want a design s. t. for each nontrivial

Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and

Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of

BCNF Decomposition Algorithm (from Korth et al. ; our book gives a recursive version)

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs:

3 NF Decomposition Algorithm Let F be a minimal cover i: =0 for each

Summary of Normalization § We can always decompose into 3 NF and get: §

Normalization Is Good… Or Is It? § In some cases, we might not mind

Slides: 25

Download presentation

Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Announcements § Homework 3 will be due Monday 10/22 § Fall break will be 10/19, midterm on 10/26 2

Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms: § Reflexivity: If Y X then X Y (trivial dependencies ) name, sid name § Augmentation: If X Y then XW YW serno subj so serno, exp-grade subj, exp-grade § Transitivity: If X Y and Y Z then X Z serno cid and cid subj so serno subj 3

Armstrong’s Axioms Lead to… If X Y and X Z then X YZ § Pseudotransitivity: If X Y and WY Z then XW Z § Decomposition: If X Y and Z Y then X Z § Union: Let’s prove a few of these from Armstrong’s Axioms 4

Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure , F +, is the set of all FD’s: {X Y | X Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student. Course FD’s? Student. Data(sid, name, serno, cid, subj, grade) name cid subj serno subj cid, sid subj cid sid 5

Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X+, is: X+ = {Y | X Y F +} § This answers the question “is Y determined (transitively) by X? ”; compute X+ by: closure : = X; repeat until no change { if there is an FD U V in F such that U is in closure then add V to closure} § Does sid, serno subj, exp-grade? 6

Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e. g. , these two sets are equivalent: {XY Z, X Y} and {X Z, X Y} § F + contains a huge number of FD’s (exponential in the size of the schema) § Would like to have smallest “representative” FD set 7

Minimal Cover we express Defn. A FD set F is minimal if: each FD in 1. Every FD in F is of the form X A, simplest form where A is a single attribute 2. For no X A in F is: in a sense, F – {X A } equivalent to F each FD is “essential” 3. For no X A in F and Z X is: to the cover F – {X A } {Z A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e. g. , {X Z, X Y} is a minimal cover for {XY Z, X Y} 8

More on Closures If F is a set of FD’s and X Y F + then for some attribute A Y, X A F + Proof by counterexample. Assume otherwise and let Y = {A 1, . . . , An} Since we assume X A 1, . . . , X An are in F + then X A 1. . . An is in F + by union rule, hence, X Y is in F + which is a contradiction 9

Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are: § Consistent : any relation satisfying FD’s in F will satisfy those in F + § Complete : if an FD X Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X Y Ø In other words, Armstrong’s axioms derive all the FD’s that should hold Ø What is the goal of using these axioms? 10

Decomposition Consider our original “bad” attribute set Stuff(sid, name, serno, subj, cid, exp-grade) We could decompose it into: Student(sid, name) Course(serno, cid) Subject(cid, subj) But this decomposition loses information about the relationship between students and courses. Why? 11

Lossless Join Decomposition R 1, … Rk is a lossless join decomposition of R w. r. t. an FD set F if for every instance r of R that satisfies F, ÕR 1(r) ⋈. . . ⋈ ÕRk(r) = r Consider: sid name serno subj cid exp-grade 1 23 Sam Nitin 570103 550103 AI DB 570 550 B A What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? 12

Testing for Lossless Join R 1, R 2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R 1 R 2) R 1 – R 2 (R 1 R 2) R 2 – R 1 So for the FD set: sid name serno cid, exp-grade cid subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition? 13

Dependency Preservation Ensures we can “easily” check whether a FD X Y is violated during an update to a database: § The projection of an FD set F onto a set of attributes Z, F Z is {X Y | X Y F +, X Y Z} i. e. , it is those FDs local to Z’s attributes § A decomposition R 1, …, Rk is dependency preserving F + = (F R 1 . . . F R k)+ if The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join 14

Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD set name street, city st street, city zip name, item price Consider the decomposition R 1(name, street, city, st, zip) and R 2(name, item, price) Ø Is it lossless? Ø Is it dependency preserving? What if we replaced the first FD by name, street city? 15

Another Example Given scheme: R(sid, fid, subj) and FD set: fid subj sid, subj fid Consider the decomposition R 1(sid, fid) and R 2(fid, subj) Ø Is it lossless? Ø Is it dependency preserving? 16

FD’s and Keys § Ideally, we want a design s. t. for each nontrivial dependency X Y, X is a superkey for some relation schema in R § We just saw that this isn’t always possible § Hence we have two kinds of normal forms 17

Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X A that holds over R, either A X (it is trivial) , or or X is a superkey for R Third Normal Form (3 NF). For every relation scheme R and for every X A that holds over R, either A X (it is trivial), or X is a superkey for R, or A is a member of some key for R 18

Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation § It’s strictly stronger than 3 NF Let’s see algorithms to obtain: § A BCNF lossless join decomposition (nondeterministic) § A 3 NF lossless join, dependency preserving decomposition 19

BCNF Decomposition Algorithm (from Korth et al. ; our book gives a recursive version) result : = {R} compute F+ while there is a relation schema R i in result that isn’t in BCNF { i. e. , A doesn’t form a key let A B be a nontrivial FD on R i s. t. A R i is not in F+ and A and B are disjoint } result: = (result – Ri) {(Ri - B), (A, B)} 20

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid name fid prof serno classroom, cid, fid § Find the Boyce-Codd Normal Form for this schema § What if instead: sid name fid prof classroom, cid serno cid 21

3 NF Decomposition Algorithm Let F be a minimal cover i: =0 for each FD A B in F { if none of the schemas R j, 1 j i, contains AB { increment i R i : = (A, B) } } if no schema R j, 1 j i contains a candidate key for R { increment i R i : = any candidate key for R } return (R 1, …, Ri) Build dep. preserving decomp. Ensure lossless decomp. 22

An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid name fid prof serno classroom, cid, fid § Find the Third Normal Form for this schema § What if instead: sid name fid prof classroom, cid serno cid 23

Summary of Normalization § We can always decompose into 3 NF and get: § Lossless join § Dependency preservation § But with BCNF: § We are only guaranteed lossless joins § The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R § BCNF is stronger than 3 NF: every BCNF schema is also in 3 NF 24

Normalization Is Good… Or Is It? § In some cases, we might not mind redundancy, if the data isn’t directly updated: § Reports (people like to see breakdowns by semester, department, course, etc. ) § Warehouses (archived copies of data for doing complex analysis) § Data sharing (sometimes we may export data into objectoriented or hierarchical formats) 25