Schema Refinement Canonicalminimal Covers Canonical Cover n Number

Schema Refinement: Canonical/minimal Covers

Canonical Cover n Number of iterations of the algorithm for computing the closure of a set of attributes depends on the number of FD’s in F n n The same will be observed for other algorithms that we will study (such as the decomposition algorithms) Can we “minimize” F?

Covers n FD’s can be represented in several different ways without changing the set of legal/valid instances of the relation n Let F and G be sets of FD’s. We say “G follows from F”, if every relation instance that satisfies F also satisfies G. In symbols: F ⊨ G. We may also say: “G is implied by F” or “G is covered by F. ” n If both F ⊨ G and G ⊨ F hold, then we say that G and F are equivalent and denote this by F ≡ G + +

Canonical Cover n Let F be a set of FD’s. A canonical / minimal cover of F is a set G of FD’s that satisfies the following: 1. G is equivalent to F; that is, G ≡ F 2. G is minimal; that is, if we obtain a set H of FD’s from G by deleting one or more of its FD’s, or by deleting one or more attributes from some FD in G, then F ≢ H

Canonical Cover A canonical cover G is minimal in two respects: 1. Every FD in G is “required” in order for G to be equivalent to F 2. Every FD in G is as “small” as possible, that is, • each attribute on the left hand side is necessary. • Recall: the RHS of every FD in G is a single attribute

Computing Canonical Cover Given a set F of FD’s, how to compute a canonical cover G of F? n Step 1: Put the FD’s in the simple form n n Initialize G : = F Replace each FD X → A 1 A 2…Ak in G with X→A 1, X→A 2, …, X→Ak n Step 2: Minimize the left hand side of each FD n E. g. , for each FD AB → C in G, check if A or B on the LHS is redundant , i. e. , (G {AB → C } ⋃ {A → C })+ ≡ F+? n Step 3: Delete redundant FD’s

Computing Canonical Cover n R = { A, B, C, D, E, H} n F = { A B, DE A, BC E, AC E, BCD A, AED B } n Step one – put FD’s in the simple form n All present FD’s are simple G = {A B, DE A, BC E, AC E, BCD A, AED B}

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n Step two – Check every FD to see if it is left reduced n For every FD X A in G, check if the closure of a subset of X determines A. If so, remove the redundant attribute(s) from X

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { A B, DE A, BC E, AC E, BCD A, AED B } n A B obviously OK (no left redundancy) n DE A n D+ = D n E+ = E

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { A B, DE A, BC E, AC E, BCD A, AED B } n BC E n B+ = B n C+ = C OK (no left redundancy)

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { A B, DE A, BC E, AC E, BCD A, AED B } n AC E n A+ = AB n C+ = C OK (no left redundancy)

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { A B, DE A, BC E, AC E, BCD A, AED B } n BCD + A n. B =B n BC+ = BCE n C+ = C n D+ = D n CD+ = CD BD+ = BD OK (no left redundancy) n

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { A B, DE A, BC E, AC E, BCD A, AED B } n AED + B n E & D are n A = AB redundant we can remove them from AED B n. G = { A B, DE A, BC E, AC E, BCD A, A G = { DE A, BC E, AC E, BCD A, A B }

Computing Canonical Cover n R = { A, B, C, D, E, H} n F = { A B, DE A, BC E, AC E, BCD A, AED B } n Step 3 – Find and remove redundant FD’s n For every FD X A in G n Remove X A from G; call the result G’ n Compute X+ under G’ n If A X+, then X A is redundant and hence we remove the FD X A from G (that is, we rename

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { DE A, BC E, AC E, BCD A, A B } n Remove DE A from G n n Compute DE+ under G’ n n G’ = { BC E, AC E, BCD A, A B } DE+ = DE (computed under G’) Since A ∉ DE, the FD DE A is not redundant n G = { DE A, BC E, AC E, BCD A, A B}

Computing Canonical Cover n n n R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } G = { DE A, BC E, AC E, BCD A, A B } n Remove BC E from G n n G’ = { DE A, AC E, BCD A, A B } Compute BC+ under G’ n BC+ = BC BC E is not redundant G = { DE A, BC E, AC E, BCD A, A B} n

Computing Canonical Cover n n n R = { A, B, C, D, E, H } F = { A B, DE A, BC E, AC E, BCD A, AED B } G = { DE A, BC E, AC E, BCD A, A B } n Remove AC E from G n n G’ = { DE A, BC E, BCD A, A B } Compute AC+ under G’ n AC+ = ACBE Since E∊ ACBE, AC E is redundant remove it from G n G = { DE A, BC E, BCD A, A B }

Computing Canonical Cover n R = { A, B, C, D, E, H } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { DE A, BC E, BCD A, A B } n Remove BCD A from G n n Compute BCD+ under G’ n n G’ = { DE A, BC E, A B } BCD+ = BCDEA This FD is redundant remove it from G n G = { DE A, BC E, A B }

Computing Canonical Cover n R = { A, B, C, D, E, F } n F = { A B, DE A, BC E, AC E, BCD A, AED B } n G = { DE A, BC E, A B } n Remove A B from G n n Compute A+ under G’ n n G’ = { DE A, BC E } A+ = A This FD is not redundant (Another reason why this is true? ) n G = { DE A, BC E, A B }

Several Canonical Covers Possible? Relation R={A, B, C} with F = {A B, A C, B A, B C, C B, C A} n Several canonical covers exist n G = {A B, B A, B C, C B} n G = {A B, B C, C A} n A B C A B A C Can you find more ? B C

How to Deal with Redundancy? Relation Schema: Star (name, address, representing. Firm, F = { name address, representing. Firm, spokes. Person) spoke. Person, representing. Firm spokes. Person } Relation Instance: Name Address Representing. Fi rm 123 Maple Star One Spokes. Pers on Joe Smith Carrie Fisher Harrison 789 Palm Star One Joe Smith Ford dr. n. Mark We. Hamill can 456 decompose into two Oak Moviesthis & Corelation Mary Johns rd. smaller relations

How to Deal with Redundancy? Relation Schema: Star (name, address, representing. Firm, spokesperson) F = { representing. Firm spokes. Person } Decompose this relation into the following relations: Star (name, address, representing. Firm) with F 1={ name address, representing. Firm } and Firm (representing. Firm, spokes. Person) with F 2= { representing. Firm spokes. Person }

How to Deal with Redundancy? Relation Instance before decomposition: Name Address Representing. Fi rm 123 Maple Star One Spokespers on Joe Smith Carrie Fisher Harrison 789 Palm Star One Joe Smith Ford dr. Relation Instances after decomposition: Mark Hamill 456 Oak Movies & Co Mary Johns rd. Name Address Representing. Fir Representing m Firm Carrie Fisher 123 Maple Star One Movies & Co Harrison 789 Palm Star One Ford dr. Mark Hamill 456 Oak Movies & Co Spokesperso n Joe Smith Mary Johns

Decomposition n A decomposition of a relation schema R consists of replacing R by two or more non-empty relation schemas such that each one is a subset of R and together they include all attributes of R. Formally, R = {R 1, …, Rm} is a decomposition if all conditions below hold: (0) Ri ≠Ø, for all i in {1, …, m} (1) R 1∪…∪ Rm = R (2) Ri ≠ Rj, for different i and j in {1, …, m} n n n When m = 2, the decomposition R = { R 1, R 2 } is called binary Not every decomposition of R is “desirable” Properties of a decomposition? (1) Lossless-join – this is a must

Example Relation Instance: A 1 4 B 2 2 C 3 5 Decomposed into: A 1 4 B 2 2 C 3 5 To “recover” information, we join the relations: A 1 4 4 1 B 2 2 C 3 5 Why do we have new tuples?

Lossless-Join Decomposition n n R is a relation schema and F is a set of FD’s over R. A binary decomposition of R into relation schemas R 1 and R 2 with attribute sets X and Y is said to be a lossless-join decomposition with respect to F, if for every instance r of R that satisfies F, we have X( r ) Y( r ) = r Thm: Let R be a relation schema and F a set of FD’s on R. A binary decomposition of R into R 1 and R 2 with attribute sets X and Y is lossless iff X Y X or X Y Y, i. e. , this binary

Example: Lossless-join Relation Instance: A 1 4 B 2 2 C 3 3 Decomposed into: A 1 4 B 2 2 B 2 C 3 F={B C} To recover the original relation r, we join the two relations: A 1 4 B 2 2 C 3 3 No new tuples !

Example: Dependency Preservation Relation Instance: A 1 4 B 2 3 C 5 6 F = { B C, B D, A D } D 7 8 Decomposed into: A 1 4 B 2 3 C 5 6 D 7 8 Can we enforce A D? How ?

Dependency-Preserving Decompositio n n n A dependency-preserving decomposition allows us to enforce every FD, on each insertion or modification of a tuple, by examining just one single relation instance Let R be a relation schema that is decomposed into two schemas with attribute sets X and Y, and let F be a set of FD’s over R. The projection of F on X (denoted by FX) is the set of FD’s in F+ that involve only attributes in X n Recall that a FD U V in F+ is in FX if all the attributes in U and V are in X; In this case we say this FD is “relevant” to X The decomposition of < R, F > into two schemas with attribute sets X and Y is dependency-preserving if (

Normal Forms n n n Given a relation schema R, we must be able to determine whether it is “good” or we need to decompose it into smaller relations, and if so, how? To address these issues, we need to study normal forms If a relation schema is in one of these normal forms, we know that it is in some “good” shape in the sense that certain kinds of problems (related to redundancy) cannot arise

Normal Forms n The normal forms based on FD’s are n n n First normal form (1 NF) Second normal form (2 NF) Third normal form (3 NF) Boyce-Codd normal form (BCNF) These normal forms have increasingly restrictive requirements BCNF 3 NF 2 NF 1 NF

Third Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. n We say R w. r. t. F is in 3 NF (third normal form), if for every FD X A in F, at least one of the following conditions holds: n A X, that is, X A is a trivial FD, or n X is a superkey, or n If X is not a key, then A is part of some key of R n To determine if a relation <R, F> is in 3 NF, we n Check whether the LHS of each nontrivial FD in F is a

Boyce-Codd Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. n We say R w. r. t. F is in Boyce-Codd normal form, i for every FD X A in F, at least one of the following conditions is true: n n n A X, that is, X A is a trivial FD, or X is a super key To determine whether R with a given set of FD’s F is in BCNF