CS 405 G Introduction to Database Systems Instructor
CS 405 G: Introduction to Database Systems Instructor: Jinze Liu Fall 2017
Today’s Topic l l Functional Dependency. Normalization Decomposition BCNF 10/24/2020 Jinze Liu @ University of Kentucky 2
Motivation l How do we tell if a design is bad, e. g. , Work. On(EID, Ename, PID, Pname, Hours)? l This design has redundancy, because the name of an employee is recorded multiple times, once for each project the employee is taking 10/24/2020 EID PID Ename Pname Hours 1234 10 John Smith B 2 B platform 10 1123 9 Ben Liu CRM 40 1234 9 John Smith CRM 30 1023 10 Susan Sidhuk B 2 B platform 40 Jinze Liu @ University of Kentucky 3
Why redundancy is bad? l l Waste disk space. What if we want to perform update operations to the relation l l l INSERT an new project that no employee has been assigned to it yet. UPDATE the name of “John Smith” to “John L. Smith” DELETE the last employee who works for a certain project EID PID Ename Pname Hours 10/24/2020 1234 10 John Smith B 2 B platform 10 1123 9 Ben Liu CRM 40 1234 9 John Smith CRM 30 1023 10 Susan Sidhuk B 2 B platform 40 Jinze Liu @ University of Kentucky 4
Functional dependencies l l A functional dependency (FD) has the form X -> Y, where X and Y are sets of attributes in a relation R X -> Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y l t 1[X] = t 2[X] t 1[Y] = t 2[Y] X Y Z a b c a b? d? Could be anything, e. g. d Must be “b” 10/24/2020 Jinze Liu @ University of Kentucky 5
FD examples Address (street_address, city, state, zip) l street_address, city, state -> zip l zip -> city, state l zip, state -> zip? l l l This is a trivial FD Trivial FD: LHS RHS zip -> state, zip? l l This is non-trivial, but not completely non-trivial Completely non-trivial FD: LHS ∩ RHS = ? 10/24/2020 Jinze Liu @ University of Kentucky 6
Keys redefined using FD’s Let attr(R) be the set of all attributes of R, a set of attributes K is a (candidate) key for a relation R if l K -> attr(R) - K, and l l No proper subset of K satisfies the above condition l l That is, K is a “super key” That is, K is minimal (full functional dependent) Address (street_address, city, state, zip) l l {street_address, city, state, zip} {street_address, city, zip} {street_address, zip} {zip} 10/24/2020 Jinze Liu @ University of Kentucky Super key Key Non-key 7
Reasoning with FD’s Given a relation R and a set of FD’s F l Does another FD follow from F? l l Are some of the FD’s in F redundant (i. e. , they follow from the others)? Is K a key of R? l What are all the keys of R? 10/24/2020 Jinze Liu @ University of Kentucky 8
Attribute closure l l Given R, a set of FD’s F that hold in R, and a set of attributes Z in R: The closure of Z (denoted Z+) with respect to F is the set of all attributes {A 1, A 2, …} functionally determined by Z (that is, Z -> A 1 A 2 …) Algorithm for computing the closure l l l Start with closure = Z If X -> Y is in F and X is already in the closure, then also add Y to the closure Repeat until no more attributes can be added 10/24/2020 Jinze Liu @ University of Kentucky 9
A more complex example Work. On(EID, Ename, email, PID, Pname, Hours) l l EID -> Ename, email -> EID PID -> Pname EID, PID -> Hours (Not a good design, and we will see why later) 10/24/2020 Jinze Liu @ University of Kentucky 10
Example of computing closure l F includes: l l l l { PID, email }+ = ? closure = { PID, email } email -> EID l l Add Ename, email; closure is now { PID, email, EID, Ename } PID -> Pname l l Add EID; closure is now { PID, email, EID } EID -> Ename, email l l EID -> Ename, email -> EID PID -> Pname EID, PID -> Hours Add Pname; close is now { PID, Pname, email, EID, Ename } EID, PID -> hours l Add hours; closure is now all the attributes in Works. On 10/24/2020 Jinze Liu @ University of Kentucky 11
Using attribute closure Given a relation R and set of FD’s F l Does another FD X -> Y follow from F? l l l Is K a super key of R? l l l Compute X+ with respect to F If Y X+, then X -> Y follow from F Compute K+ with respect to F If K+ contains all the attributes of R, K is a super key Is a super key K a key of R? l Test where K’ = K – { a | a K} is a superkey of R for all possible a 10/24/2020 Jinze Liu @ University of Kentucky 12
Rules of FD’s l Armstrong’s axioms l l Reflexivity: If Y X, then X -> Y Augmentation: If X -> Y, then XZ -> YZ for any Z Transitivity: If X -> Y and Y -> Z, then X -> Z Rules derived from axioms l l Splitting: If X -> YZ, then X -> Y and X -> Z Combining: If X -> Y and X -> Z, then X -> YZ 10/24/2020 Jinze Liu @ University of Kentucky 13
Using rules of FD’s Given a relation R and set of FD’s F l Does another FD X -> Y follow from F? l l Use the rules to come up with a proof Example: l l F includes: EID -> Ename, email; email -> EID; EID, PID -> Hours, Pid -> Pname PID, email ->hours? email -> EID (given in F) PID, email -> PID, EID (augmentation) PID, EID -> hours (given in F) PID, email -> hours (transitivity) 10/24/2020 Jinze Liu @ University of Kentucky 14
Normalization l A normalization is the process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations l A normal form is a certification that tells whether a relation schema is in a particular state 10/24/2020 Jinze Liu @ University of Kentucky 15
1 st Normal Form 10/24/2020 Jinze Liu @ University of Kentucky 16
Example of redundancy l l Work. On (EID, Ename, email, PID, hour) We say X -> Y is a partial dependency if there exist a X’ X such that X’ -> Y l l e. g. EID, email-> Ename, email Otherwise, X -> Y is a full dependency l e. g. EID, PID -> hours EID PID Ename email Pname Hours 1234 10 John Smith jsmith@ac. com B 2 B platform 10 1123 9 Ben Liu bliu@ac. com CRM 40 1234 9 John Smith jsmith@ac. com CRM 30 1023 10 Susan Sidhuk 10/24/2020 ssidhuk@ac. com B 2 B platform Jinze Liu @ University of Kentucky 40 17
2 nd Normal Form l l An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute. R is in (general) 2 nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R X Y Z W a b c e b b c f c b c g X , Y -> Z, W Y -> Z 10/24/2020 (X , Y , W) (Y, Z) Jinze Liu @ University of Kentucky 18
2 nd Normal Form l Note about 2 Nd Normal Form l l by definition, every nonprimary attribute is functionally dependent on every key of R In other words, R is in its 2 nd normal form if we could not find a partial dependency of a nonprimary key to a key in R. 10/24/2020 Jinze Liu @ University of Kentucky 19
Decomposition EID PID Ename email Pname Hours 1234 10 John Smith jsmith@ac. com B 2 B platform 10 1123 9 Ben Liu bliu@ac. com CRM 40 1234 9 John Smith jsmith@ac. com CRM 30 1023 10 Susan Sidhuk Decomposition ssidhuk@ac. com B 2 B platform 40 Foreign key EID Ename email EID Pname Hours 1234 John Smith jsmith@ac. com 1234 10 B 2 B platform 10 1123 Ben Liu bliu@ac. com 1123 9 CRM 40 1023 Susan Sidhuk ssidhuk@ac. com 1234 9 CRM 30 1023 10 B 2 B platform 40 l l Decomposition eliminates redundancy To get back to the original relation: 10/24/2020 Jinze Liu @ University of Kentucky 20
Decomposition l Decomposition may be applied recursively EID Pname Hours 1234 10 B 2 B platform 10 1123 9 CRM 40 1234 9 CRM 30 1023 10 B 2 B platform 40 PID Pname EID PID Hours 10 B 2 B platform 1234 10 10 9 CRM 1123 9 40 1234 9 30 1023 10 40 10/24/2020 Jinze Liu @ University of Kentucky 21
Unnecessary decomposition l l EID Ename email 1234 John Smith jsmith@ac. com 1123 Ben Liu bliu@ac. com 1023 Susan Sidhuk ssidhuk@ac. com EID Ename EID email 1234 John Smith 1234 jsmith@ac. com 1123 Ben Liu 1123 bliu@ac. com 1023 Susan Sidhuk 1023 ssidhuk@ac. com Fine: join returns the original relation Unnecessary: no redundancy is removed, and now EID is stored twice-> 10/24/2020 Jinze Liu @ University of Kentucky 22
Bad decomposition l l EID PID Hours 1234 10 10 1123 9 40 1234 9 30 1023 10 40 EID PID EID Hours 1234 10 1123 9 1123 40 1234 9 1234 30 1023 10 1023 40 Association between PID and hours is lost Join returns more rows than the original relation 10/24/2020 Jinze Liu @ University of Kentucky 23
Lossless join decomposition l Decompose relation R into relations S and T l l l attrs(R) = attrs(S) S = πattrs(S) ( R ) T = πattrs(T) ( R ) attrs(T) l The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = S T l Any decomposition gives R S T (why? ) l A lossy decomposition is one with R S T 10/24/2020 Jinze Liu @ University of Kentucky 24
Loss? But I got more rows-> l “Loss” refers not to the loss of tuples, but to the loss of information l Or, the ability to distinguish different original tuples 10/24/2020 EID PID Hours 1234 10 10 1123 9 40 1234 9 30 1023 10 40 EID PID EID Hours 1234 10 1123 9 1123 40 1234 9 1234 30 1023 10 1023 40 Jinze Liu @ University of Kentucky 25
Questions about decomposition l When to decompose l How to come up with a correct decomposition (i. e. , lossless join decomposition) 10/24/2020 Jinze Liu @ University of Kentucky 26
Non-key FD’s l Consider a non-trivial FD X -> Y where X is not a super key l Since X is not a super key, there are some attributes (say Z) that are not functionally determined by X X Y Z a b c a b d That b is always associated with a is recorded by multiple rows: redundancy, update anomaly, deletion anomaly 10/24/2020 Jinze Liu @ University of Kentucky 27
Dealing with Nonkey Dependency: BCNF l A relation R is in Boyce-Codd Normal Form if l l l When to decompose l l For every non-trivial FD X -> Y in R, X is a super key That is, all FDs follow from “key -> other attributes” As long as some relation is not in BCNF How to come up with a correct decomposition l F Always decompose on a BCNF violation (details next) Then it is guaranteed to be a lossless join decomposition-> 10/24/2020 Jinze Liu @ University of Kentucky 28
BCNF decomposition algorithm l Find a BCNF violation l l Decompose R into R 1 and R 2, where l l l That is, a non-trivial FD X -> Y in R where X is not a super key of R R 1 has attributes X Y R 2 has attributes X Z, where Z contains all attributes of R that are in neither X nor Y (i. e. Z = attr(R) – X – Y) Repeat until all relations are in BCNF 10/24/2020 Jinze Liu @ University of Kentucky 29
BCNF decomposition example Work. On (EID, Ename, email, PID, hours) BCNF violation: EID -> Ename, email Student (EID, Ename, email) BCNF 10/24/2020 Grade (EID, PID, hours) BCNF Jinze Liu @ University of Kentucky 30
Another example Work. On (EID, Ename, email, PID, hours) BCNF violation: email -> EID Student. ID (email, EID) BCNF Student. Grade’ (email, Ename, PID, hours) BCNF violation: email -> Ename Student. Name (email, Ename) Grade (email, PID, hours) BCNF 10/24/2020 Jinze Liu @ University of Kentucky 31
Exercise l Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) l l Property_id#-> County_name, Lot#, Area, Price, Tax_rate County_name, Lot# -> Property_id#, Area, Price, Tax_rate County_name -> Tax_rate area -> Price 10/24/2020 Jinze Liu @ University of Kentucky 32
Exercise Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) BCNF violation: County_name -> Tax_rate LOTS 1 (County_name, Tax_rate ) BCNF LOTS 2 (Property_id#, County_name, Lot#, Area, Price) BCNF violation: Area -> Price LOTS 2 A (Area, Price) BCNF LOTS 2 B (Property_id#, County_name, Lot#, Area) BCNF 10/24/2020 Jinze Liu @ University of Kentucky 33
Why is BCNF decomposition lossless Given non-trivial X -> Y in R where X is not a super key of R, need to prove: l Anything we project always comes back in the join: R πXY ( R ) πXZ ( R ) l l Sure; and it doesn’t depend on the FD Anything that comes back in the join must be in the original relation: R πXY ( R ) πXZ ( R ) l Proof makes use of the fact that X -> Y 10/24/2020 Jinze Liu @ University of Kentucky 34
10/24/2020 Jinze Liu @ University of Kentucky 35
10/24/2020 Jinze Liu @ University of Kentucky 36
Recap l l Functional dependencies: a generalization of the key concept Partial dependencies: a source of redundancy l l l Use 2 nd Normal form to remove partial dependency Non-key functional dependencies: a source of redundancy BCNF decomposition: a method for removing ALL functional dependency related redundancies l Plus, BNCF decomposition is a lossless join decomposition 10/24/2020 Jinze Liu @ University of Kentucky 37
- Slides: 37