Chapter 7 Relational Database Design Chapter 7 Relational

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design n First Normal Form n Pitfalls in Relational Database Design n Functional Dependencies n Decomposition n Boyce-Codd Normal Form n Third Normal Form n Multivalued Dependencies and Fourth Normal Form n Overall Database Design Process Database System Concepts 7. 2 ©Silberschatz, Korth and Sudarshan

First Normal Form n Domain is atomic if its elements are considered to be indivisible units H Examples of non-atomic domains: 4 Set of names, composite attributes 4 Identification numbers like CS 101 that can be broken up into parts n A relational schema R is in first normal form if the domains of all attributes of R are atomic n Non-atomic values complicate storage and encourage redundant (repeated) storage of data H E. g. Set of accounts stored with each customer, and set of owners stored with each account H We assume all relations are in first normal form Database System Concepts 7. 3 ©Silberschatz, Korth and Sudarshan

First Normal Form (Contd. ) n Atomicity is actually a property of how the elements of the domain are used. H E. g. Strings would normally be considered indivisible H Suppose that students are given roll numbers which are strings of the form CS 0012 or EE 1127 H If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. H Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts 7. 4 ©Silberschatz, Korth and Sudarshan

Pitfalls in Relational Database Design n Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to H Repetition of Information. H Inability to represent certain information. n Design Goals: H Avoid redundant data H Ensure that relationships among attributes are represented H Facilitate the checking of updates for violation of database integrity constraints. Database System Concepts 7. 5 ©Silberschatz, Korth and Sudarshan

Example n Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) n Redundancy: H Data for branch-name, branch-city, assets are repeated for each loan that a branch makes H Wastes space H Complicates updating, introducing possibility of inconsistency of assets value n Null values H Cannot store information about a branch if no loans exist H Can use null values, but they are difficult to handle. Database System Concepts 7. 6 ©Silberschatz, Korth and Sudarshan

Decomposition n Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) n All attributes of an original schema (R) must appear in the decomposition (R 1, R 2): R = R 1 R 2 n Lossless-join decomposition. For all possible relations r on schema R r = R 1 (r) Database System Concepts R 2 (r) 7. 7 ©Silberschatz, Korth and Sudarshan

Sample lending Relation Database System Concepts 7. 8 ©Silberschatz, Korth and Sudarshan

The customer Relation Database System Concepts 7. 9 ©Silberschatz, Korth and Sudarshan

The loan Relation Database System Concepts 7. 10 ©Silberschatz, Korth and Sudarshan

The branch Relation Database System Concepts 7. 11 ©Silberschatz, Korth and Sudarshan

The Relation branch-customer Database System Concepts 7. 12 ©Silberschatz, Korth and Sudarshan

The Relation customer-loan Database System Concepts 7. 13 ©Silberschatz, Korth and Sudarshan

The Relation branch-customer Database System Concepts 7. 14 customer-loan ©Silberschatz, Korth and Sudarshan

Example of Non Lossless-Join Decomposition of R = (A, B) R 2 = (A)R 2 = (B) A B 1 2 1 1 2 A(r) B(r) r A (r) Database System Concepts B (r) A B 1 2 7. 15 ©Silberschatz, Korth and Sudarshan

Goal — Devise a Theory for the Following n Decide whether a particular relation R is in “good” form. n In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2, . . . , Rn} such that H each relation is in good form H the decomposition is a lossless-join decomposition n Our theory is based on: H functional dependencies H multivalued dependencies Database System Concepts 7. 16 ©Silberschatz, Korth and Sudarshan

Functional Dependencies n Constraints on the set of legal relations. n Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. n A functional dependency is a generalization of the notion of a key. Database System Concepts 7. 17 ©Silberschatz, Korth and Sudarshan

Functional Dependencies (Cont. ) n Let R be a relation schema R and R n The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes , they also agree on the attributes . That is, t 1[ ] = t 2 [ ] Database System Concepts 7. 18 ©Silberschatz, Korth and Sudarshan

Sample Relation r Database System Concepts 7. 19 ©Silberschatz, Korth and Sudarshan

Functional Dependencies (Cont. ) n K is a superkey for relation schema R if and only if K R n K is a candidate key for R if and only if H K R, and H for no K, R n Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect this set of functional dependencies to hold: loan-number amount loan-number branch-name but would not expect the following to hold: loan-number customer-name Database System Concepts 7. 20 ©Silberschatz, Korth and Sudarshan

Use of Functional Dependencies n We use functional dependencies to: H test relations to see if they are legal under a given set of functional dependencies. 4 If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. H specify constraints on the set of legal relations 4 We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. n Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. H For example, a specific instance of Loan-schema may, by chance, satisfy Database System Concepts loan-number customer-name. 7. 21 ©Silberschatz, Korth and Sudarshan

Functional Dependencies (Cont. ) n A functional dependency is trivial if it is satisfied by all instances of a relation H E. g. 4 customer-name, loan-number customer-name 4 customer-name H In general, is trivial if Database System Concepts 7. 22 ©Silberschatz, Korth and Sudarshan

Closure of a Set of Functional Dependencies n Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. H E. g. If A B and B C, then we can infer that A C n The set of all functional dependencies logically implied by F is the closure of F. n We denote the closure of F by F+. n We can find all of F+ by applying Armstrong’s Axioms: H if , then (reflexivity) H if , then (augmentation) H if , and , then (transitivity) n These rules are H sound (generate only functional dependencies that actually hold) and H complete (generate all functional dependencies that hold). Database System Concepts 7. 23 ©Silberschatz, Korth and Sudarshan

Example n R = (A, B, C, G, H, I) F={ A B A C CG H CG I B H} n some members of F+ H A H H AG I H CG HI Database System Concepts 7. 24 ©Silberschatz, Korth and Sudarshan

Procedure for Computing F+ n To compute the closure of a set of functional dependencies F: F+ = F repeat for each functional dependency f in F+ apply reflexivity and augmentation rules on f add the resulting functional dependencies to F+ for each pair of functional dependencies f 1 and f 2 in F+ if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F+ until F+ does not change any further NOTE: We will see an alternative procedure for this task later Database System Concepts 7. 25 ©Silberschatz, Korth and Sudarshan

Closure of Functional Dependencies (Cont. ) n We can further simplify manual computation of F+ by using the following additional rules. H If holds and holds, then holds (union) H If holds, then holds and holds (decomposition) H If holds and holds, then holds (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms. Database System Concepts 7. 26 ©Silberschatz, Korth and Sudarshan

Closure of Attribute Sets n Given a set of attributes , define the closure of under F (denoted by +) as the set of attributes that are functionally determined by under F: is in F+ if + n Algorithm to compute +, the closure of under F result : = ; while (changes to result) do for each in F do begin if result then result : = result end Database System Concepts 7. 27 ©Silberschatz, Korth and Sudarshan

Example of Attribute Set Closure n R = (A, B, C, G, H, I) n F = {A B A C CG H CG I B H} n (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGHI (CG H, CG I and CG ABCG) n Is AG a candidate key? 1. Is AG a super key? 1. Does AG R? == Is (AG)+ R 2. Is any subset of AG a superkey? 1. Does A R? == Is (A)+ R 2. Does G R? == Is (G)+ R Database System Concepts 7. 28 ©Silberschatz, Korth and Sudarshan

Uses of Attribute Closure There are several uses of the attribute closure algorithm: n Testing for superkey: H To test if is a superkey, we compute +, and check if + contains all attributes of R. n Testing functional dependencies H To check if a functional dependency holds (or, in other words, is in F+), just check if +. H That is, we compute + by using attribute closure, and then check if it contains . H Is a simple and cheap test, and very useful n Computing closure of F H For each R, we find the closure +, and for each S +, we output a functional dependency S. Database System Concepts 7. 29 ©Silberschatz, Korth and Sudarshan

Canonical Cover n Sets of functional dependencies may have redundant dependencies that can be inferred from the others H Eg: A C is redundant in: {A B, B C, A C} H Parts of a functional dependency may be redundant 4 E. g. on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} 4 E. g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} n Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies Database System Concepts 7. 30 ©Silberschatz, Korth and Sudarshan

Extraneous Attributes n Consider a set F of functional dependencies and the functional dependency in F. H Attribute A is extraneous in if A and F logically implies (F – { }) {( – A) }. § Attribute A is extraneous in if A and the set of functional dependencies (F – { }) { ( – A)} logically implies F. n Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one n Example: Given F = {A C, AB C } H B is extraneous in AB C because {A C, AB C} logically implies A C (I. e. the result of dropping B from AB C). n Example: Given F = {A C, AB CD} H C is extraneous in AB CD since AB C can be inferred even after deleting C Database System Concepts 7. 31 ©Silberschatz, Korth and Sudarshan

Testing if an Attribute is Extraneous n Consider a set F of functional dependencies and the functional dependency in F. § To test if attribute A is extraneous in 1. compute ({ } – A)+ using the dependencies in F 2. check that ({ } – A)+ contains A; if it does, A is extraneous § To test if attribute A is extraneous in 1. compute + using only the dependencies in F’ = (F – { }) { ( – A)}, 2. check that + contains A; if it does, A is extraneous Database System Concepts 7. 32 ©Silberschatz, Korth and Sudarshan

Canonical Cover A canonical cover for F is a set of dependencies Fc such that H F logically implies all dependencies in Fc, and H Fc logically implies all dependencies in F, and H No functional dependency in Fc contains an extraneous attribute, and H Each left side of functional dependency in Fc is unique. To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1 1 and 1 1 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from until F does not change n Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied § § Database System Concepts 7. 33 ©Silberschatz, Korth and Sudarshan

Example of Computing a Canonical Cover n R = (A, B, C) F = {A BC, B C, A B, AB C} n Combine A BC and A B into A BC H Set is now {A BC, B C, AB C} n A is extraneous in AB C H Check if the result of deleting A from AB C is implied by the other dependencies 4 Yes: in fact, B C is already present! H Set is now {A BC, B C} n C is extraneous in A BC H Check if A C is logically implied by A B and the other dependencies 4 Yes: using transitivity on A B and B C. – Can use attribute closure of A in more complex cases n The canonical cover is: {A B, B C} Database System Concepts 7. 34 ©Silberschatz, Korth and Sudarshan

Goals of Normalization n Decide whether a particular relation R is in “good” form. n In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2, . . . , Rn} such that H each relation is in good form H the decomposition is a lossless-join decomposition n Our theory is based on: H functional dependencies H multivalued dependencies Database System Concepts 7. 35 ©Silberschatz, Korth and Sudarshan

Decomposition n Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) n All attributes of an original schema (R) must appear in the decomposition (R 1, R 2): R = R 1 R 2 n Lossless-join decomposition. For all possible relations r on schema R r = R 1 (r) R 2 (r) n A decomposition of R into R 1 and R 2 is lossless join if at least one of the following dependencies is in F+: H R 1 R 2 R 1 H R 1 R 2 Database System Concepts 7. 36 ©Silberschatz, Korth and Sudarshan

Example of Lossy-Join Decomposition n Lossy-join decompositions result in information loss. n Example: Decomposition of R = (A, B) R 2 = (A)R 2 = (B) A B 1 2 1 1 2 A(r) B(r) r A (r) Database System Concepts B (r) A B 1 2 7. 37 ©Silberschatz, Korth and Sudarshan

Normalization Using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R 1, R 2, . . , Rn we want H Lossless-join decomposition: Otherwise decomposition would result in information loss. H Dependency preservation: Let Fi be the set of dependencies F+ that include only attributes in Ri. 4 Preferably the decomposition should be dependency preserving, that is, (F 1 F 2 … Fn)+ = F + 4 Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. H No redundancy: For obvious reasons. 4 Formulated in terms of normal forms Database System Concepts 7. 38 ©Silberschatz, Korth and Sudarshan

Example n R = (A, B, C) F = {A B, B C) H Can be decomposed in two different ways n R 1 = (A, B), R 2 = (B, C) H Lossless-join decomposition: R 1 R 2 = {B} and B BC H Dependency preserving n R 1 = (A, B), R 2 = (A, C) H Lossless-join decomposition: R 1 R 2 = {A} and A AB H Not dependency preserving (cannot check B C without computing R 1 Database System Concepts 7. 39 R 2 ) ©Silberschatz, Korth and Sudarshan

Testing for Dependency Preservation n To check if a dependency is preserved in a decomposition of R into R 1, R 2, …, Rn we apply the following simplified test (with attribute closure done w. r. t. F) H result = while (changes to result) do for each Ri in the decomposition t = (result Ri)+ Ri result = result t H If result contains all attributes in , then the functional dependency is preserved. n We apply the test on all dependencies in F to check if a decomposition is dependency preserving n This procedure takes polynomial time, instead of the exponential time required to compute F+ and (F 1 F 2 … Fn)+ Database System Concepts 7. 40 ©Silberschatz, Korth and Sudarshan

To be continued