Closure The closure of B 1Bk under the
Closure • The closure of {B 1…Bk} under the set of FDs S, denoted by {B 1…Bk}+, is defined as follows: {B 1…Bk}+ = {B | any relation satisfies S will also satisfies B 1…Bk B}
Computing the closure • • Given: the set S and {A 1, …, An} Compute: {A 1, …, An}+- denote this set by X Step 1: X = {A 1, …, An} Step 2: find a FD B 1…Bk B in S such that {B 1, …, Bk} X and B X, then X=X {B} • Step 3: repeat step 2 until nothing more can be added to X, then go to step 4 • Step 4: return X
Example • • • S = {AB C, BC AD, D E, CF B} Compute {A, B}+ Step 1: X = {A, B} Step 2: X = X {C}={A, B, C} because AB C Step 3 back to step 2: X = X {D} because BC AD • Step 3 back to step 2: X = X {E} because D E • Step 3 back to step 2: nothing more • Step 3 go to step 4: return {A, B, C, D, E}
Correctness of closure algorithm • It computes true functional dependencies – proof: show that if B belongs to {A 1, …, An}+ then A 1…An B holds. By induction over the number of steps (n) used in adding an attribute B into the set X • n=0 then B belongs to {A 1, …, An} and so A 1…An B is a trivial functional dependency • n n+1: if B is added to X in the step n+1, then A 1…An Bj for all j by inductive hypothesis; this, together with B 1…Bk B, implies that A 1…An B • It computes all functional dependencies – proof: show that if B does not belong to {A 1, …, An}+ then A 1…An B does not hold. By constructing an instance I of the relation R such that the FD does not hold. A in the closure Others 111 … 111 000 … 000 111 … 111
Simple questions • What is {A 1, …, An}+ if {A 1, …, An} is a key of the relation? • Can {A 1, …, An}+ ={A 1, …, An}? • Does {B 1, …, Bm} {A 1, …, An} imply {B 1, …, Bm}+ {A 1, …, An}+?
Transitive Rules • Given • then A 1…An B 1…Bm C 1…Ck A 1…An C 1…Ck
Closing sets of FDs • Given a set of FDs we can derive some other FDs using the rules about FDs (e. g. combining, splitting, and transitive) • For a relation R, a set of FD is called a basis for R if all other FDs of R can be derived form it. • A basis is minimal if none of its proper subsets is a basis.
Projecting FDs B’s • Given: R S – R with a set of FDs F – S (a new relation) is obtained by removing the attributes {B 1, …, Bm} from R • Questions: What are the FDs of S? • Answer: if A 1…An C 1…Ck is a FD of R and none of the Bs appears on the left or right side ({B 1, …, Bm} {A 1, …, An, C 1, …, Ck}= ) is a FD of S
Projecting - Example • Given R(A, B, C, D) with the FDs A B, B C, and C D. • Remove the attribute B from R, we obtain a new relation S(A, C, D). • What are the FDs of S? – A C? – A D? – C D? We can compute this by: • Compute all the closure of every subset of {A, C, D} by using the FDs of R that do not contain B.
Homework 3. 5. 1 Consider a relation with schema R(A, B, C, D) and FD’s AB C, C D, and D A. – What are all the nontrivial FD’s that follow from the given FD’s? List only the FDs with one attribute on the right? (5 pt) – What are the keys of R? (5 pt) – What are the superkeys but not keys? (5 pt) 3. 5. 3 Show that the following rule holds: if A 1…An B 1…Bm and C 1…Ck D 1…Dt hold then A 1…An. C 1…Ck B 1…Bm. D 1…Dt also holds. (5 pt)
For those whole like fun: 3. 5. 4 Does the following hold: – if A B then B A – if AB C and A C then B C 3. 5. 8 A set of attributes is closed if X+=X. What are the FDs of a relation R(A, B, C, D) if – all sets of four attributes are closed – the only closed sets are {} and {A, B, C, D} – the closed sets are {}, {A, B, C, D} (note: the cases are considered separate) Stars: try the exercises with stars.
Design of Relational Database Schema title year length studio. Name star. Name film. Type Star Wars 1977 124 Fox Mark Hamill color Star Wars 1977 124 Fox Harrison Ford color Star Wars 1977 124 Fox Carrie Fisher color Mighty Ducks 1991 104 Disney Emilio Estevez color Wayne’s World 1992 95 Paramount Dana Carvey color Wayne’s World 1992 95 Paramount Mike Meyers color Some observations: • value of studio. Name is the same in several tuples S? E I L A ORM N • value of film. Type is also repeated A E HES T D I AVO E What wrong with it? W CAN • redundancy store the same value unnecessary several time • update anormalies an update might require several changes • deletion anormalies losing information if delete a value
Possible ways to avoid anormalies (Intuition) • The bad way: start again (Oh, no!) • The natural way: try to decompose the given relation into two or more relations that – contain the same information – avoid the anormalies
Example title year length studio. Name star. Name film. Type Star Wars 1977 124 Fox Mark Hamill color Star Wars 1977 124 Fox Harrison Ford color Star Wars 1977 124 Fox Carrie Fisher color Mighty Ducks 1991 104 Disney Emilio Estevez color Wayne’s World 1992 95 Paramount Dana Carvey color Wayne’s World 1992 95 Paramount Mike Meyers color title year length studio. Name film. Type Star Wars 1977 124 Fox color Mighty Ducks 1991 104 Disney color Wayne’s World 1992 95 Paramount color title year star. Name Star Wars 1977 Mark Hamill Star Wars 1977 Harrison Ford Star Wars 1977 Carrie Fisher Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Wayne’s World 1992 Mike Meyers Movie. Studio. Star(title, year, length, studio. Name, star. Name, film. Type) is decomposed into 2 relations Movie. Studio(title, year, length, studio. Name, film. Type) and Stars. In(title, year, star. Name)
Decomposition Given a relation R with schema {A 1, …, An}. A decomposition of R into two relations S and T with schemas {B 1, …, Bm} and {C 1, …, Ck}, respectively, such that 1. {A 1, …, An} = {B 1, …, Bm} {C 1, …, Ck} 2. The tuples in S are the projections onto {B 1, …, Bm} of all the tuples in R. 3. The tuples in T are the projections onto {C 1, …, Ck} of all the tuples in R.
Example – Projections title year length studio. Name star. Name film. Type Star Wars 1977 124 Fox Mark Hamill color Star Wars 1977 124 Fox Harrison Ford color Star Wars 1977 124 Fox Carrie Fisher color Mighty Ducks 1991 104 Disney Emilio Estevez color Wayne’s World 1992 95 Paramount Dana Carvey color Wayne’s World 1992 95 Paramount Mike Meyers color How do we come up with this decomposition? title year length studio. Name film. Type Star Wars 1977 124 Fox color Mighty Ducks 1991 104 Disney color Wayne’s World 1992 95 Paramount color title year star. Name Star Wars 1977 Mark Hamill Star Wars 1977 Harrison Ford Star Wars 1977 Carrie Fisher Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Wayne’s World 1992 Mike Meyers Movie. Studio. Star(title, year, length, studio. Name, star. Name, film. Type) is decomposed into 2 relations Movie. Studio(title, year, length, studio. Name, film. Type) and Stars. In(title, year, star. Name)
Boyce-Codd Normal Form (BCNF) • BCNF: a relation R is in BCNF iff: whenever there is a nontrivial FD A 1…An B for R, it is the case that {A 1, …, An} is a superkey for R. • Why this definition? Answer: if a relation is in BCNF then there is no anormaly. • Example: Movie. Studio. Star(title, year, length, studio. Name, star. Name, film. Type): not in BCNF Movie. Studio(title, year, length, studio. Name, film. Type): in BCNF Stars. In(title, year, star. Name): in BCNF
Decomposition into BCNF • Suppose that we decompose a relation R into two relations S and T which are in BCNF. The requirements for S and T: – S and T is a decomposition of R – it is possible to reconstruct R from S and T • Will every decomposition of R satisfy these two conditions? • What are the FDs of the new relations?
Algorithm • Given a relation R with the attributes {A 1, …, An}. • Step 1: For every nontrivial FD B 1…Bm B if {B 1, …, Bm} is a superkey then returns R (no decomposition is needed) • Step 2: Takes a nontrivial FD B 1…Bm B such that {B 1, …, Bm} is not a superkey, then decomposes R into two relations S and T with the following schema: – S’s schema: {B 1, …, Bm}+ – T’s schema: {B 1, …, Bm} ({A 1, …, An}{B 1, …, Bm}+) • Repeat Step 1&2 for S and T until no decomposition is needed for every new relation; return the set of new relations as the result
Example The ‘new’ movie relation with the following attributes: {title, year, studio. Name, president, pres. Address} (we call this set ALL) with the FDs: {title year studio. Name, studio. Name president, president pres. Address} Only one key: {title, year} studio. Name president violated BCNF Step 2: takes studio. Name president, decomposes into – S with the schema {studio. Name}+={studio. Name, president, pres. Address} – T with the schema {studio. Name, title, year}={studio. Name} (ALL {studio. Name}+) Check: {studio. Name, title, year} is in BCNF (the first two FDs) {studio. Name, president, pres. Address} is not in BCNF Continue with the decomposition of S using president pres. Address and we get the following two relation schemas: {president, pres. Address} and {president, studio. Name} both are in BCNF. The final result: {studio. Name, title, year}, {president, pres. Address}, {president, studio. Name}
Recovering information from a decomposition • Suppose that R with the schema {A 1, …, An} is decomposed into two relations S and T according to the algorithm whose attributes are {B 1, …, Bm}+ and {B 1, …, Bm} ({A 1, …, An}{B 1, …, Bm}+) • The tuples of R can be obtained by joining all possible pairs of S and T where {B 1, …, Bm} have the same values.
Recovering … the rest of the closure the B’s others t’ (S) Join Projection t (R) t’’ (T) {B 1, …, Bm}+ {B 1, …, Bm} {A 1, …, An}{B 1, …, Bm}+
Example – Decomposition and Recovering title year length studio. Name star. Name film. Type Star Wars 1977 124 Fox Mark Hamill color Star Wars 1977 124 Fox Harrison Ford color Star Wars 1977 124 Fox Carrie Fisher color Mighty Ducks 1991 104 Disney Emilio Estevez color Wayne’s World 1992 95 Paramount Dana Carvey color Wayne’s World 1992 95 Paramount Mike Meyers color title year length studio. Name film. Type Star Wars 1977 124 Fox color Mighty Ducks 1991 104 Disney color Wayne’s World 1992 95 Paramount color title year star. Name Star Wars 1977 Mark Hamill Star Wars 1977 Harrison Ford Star Wars 1977 Carrie Fisher Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Wayne’s World 1992 Mike Meyers Movie. Studio. Star(title, year, length, studio. Name, star. Name, film. Type) is not in BCNF is decomposed into 2 relations that are in BCNF: Movie. Studio(title, year, length, studio. Name, film. Type) and Stars. In(title, year, star. Name)
Some remarks • • The algorithm will stop and output a set of BCNF relations. Not every decomposition according to the algorithm is good The FD’s for the new relations are determined by ‘projecting’. If a decomposition is based on FDs (according to the algorithm) then the recovering process will give us exactly the original relation. • If a decomposition is not based on FDs then we might not be able to recover the original relation from the new ones: – Example: R(A, B, C) with A B and we decompose it into S(A, B) and T(B, C): A 1 4 B 2 2 C 3 5 A 1 B 2 C 3 4 2 2 5 A 1 1 4 B 2 2 2 C 3 5 3 4 2 5
Third Normal Form (3 NF) • So far: if a relation is not in BCNF then anormalies arise. • Given a relation Bookings with the attributes: – title: name of the movie – theater: name of theater where the movie is being shown – city: the city where theater is located (a tuple (m, t, c): represents the fact that movie m is shown at theater t in city c)
Bookings(title, theater, city) • The FDs of the relations: – theater city – title city theater • theater city violates the BCNF condition, why? • decomposition yields: {theater, city} and {theater, title} • Consider the relations: Possible relations according to the FDs of each schema theater Guild city Menlo Park Menlo the titl ate e r Gu Ne recovering Violate the FD title city theater Guild title Net city Menlo Park Net Menlo
3 NF • A relaxation of the BCNF condition: a relation R is in 3 NF if: whenever there is a nontrivial FD A 1…An B, either {A 1, …, An} is a superkey or B is a member of some key. • Bookings(title, theater, city) is in 3 NF
Checking BCNF and 3 NF • Given R(A, B, C, D) with FDs AB C, C D, D A. • Question: Indicate the BCNF violations and 3 NF violations. • Steps in answering the question: – Step 1: compute all nontrivial FDs (right side: one att) – Step 2: find all keys – Step 3: find all the violations • Step 1: AB C, C D, D A, AB D, C A, DB C, AC D • Step 2: Keys – {A, B}, {C, B}, and {D, B} • Step 3: – BCNF violation: C D, D A, C A, AC D and their trivial extensions (e. g. CD D, DA A, …) – 3 NF violation: none
- Slides: 28