Chapter 7 Relational Database Design Database System Concepts
Chapter 7: Relational Database Design Database System Concepts, 5 th Ed.
Chapter 7: Relational Database Design n Features of Good Relational Design n Atomic Domains and First Normal Form n Decomposition Using Functional Dependencies n Functional Dependency Theory n Algorithms for Functional Dependencies n Database-Design Process Database System Concepts - 5 th Edition 7. 2 Pany, ccsu
The Banking Schema n Branch = (branch_name, branch_city, assets) n 支行(支行名称,部门所在城市,资产) n customer = (customer_id, customer_name, customer_street, customer_city) n 顾客(顾客id, 顾客名,顾客所在街道,顾客所在城市) n loan = (loan_number, amount) 贷款(贷款号,金额) n account = (account_number, balance) 账户(账户号,余额) n employee = (employee_id. employee_name, telephone_number, start_date) n 雇员(雇员id,雇员姓名,电话号码,开始 作日期) n dependent_name = (employee_id, dname) n 家属姓名(雇员id, 家属姓名) n account_branch = (account_number, branch_name) n loan_branch = (loan_number, branch_name) Database System Concepts - 5 th Edition 7. 3 Pany, ccsu
The Banking Schema n borrower = (customer_id, loan_number) n depositor = (customer_id, account_number) n cust_banker = (customer_id, employee_id, type) n works_for = (worker_employee_id, manager_employee_id) n payment = (loan_number, payment_date, payment_amount) n savings_account = (account_number, interest_rate) n checking_account = (account_number, overdraft_amount) Database System Concepts - 5 th Edition 7. 4 Pany, ccsu
Combine Schemas? n Suppose we combine borrower and loan to get bor_loan = (customer_id, loan_number, amount ) n Result is possible repetition of information (L-100 in example below) Database System Concepts - 5 th Edition 7. 5 Pany, ccsu
A Combined Schema Without Repetition n Consider combining loan_branch and loan_amt_br = (loan_number, amount, branch_name) n No repetition (as suggested by example below) Database System Concepts - 5 th Edition 7. 6 Pany, ccsu
What About Smaller Schemas? n Suppose we had started with bor_loan. How would we know to split up (decompose) it into borrower and loan? n Write a rule “if there were a schema (loan_number, amount), then loan_number would be a candidate key” n Denote as a functional dependency: loan_number amount n In bor_loan, because loan_number is not a candidate key, the amount of a loan may have to be repeated. This indicates the need to decompose bor_loan. n Not all decompositions are good. Suppose we decompose employee into employee 1 = (employee_id, employee_name) employee 2 = (employee_name, telephone_number, start_date) n The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition. Database System Concepts - 5 th Edition 7. 7 Pany, ccsu
A Lossy Decomposition Database System Concepts - 5 th Edition 7. 8 Pany, ccsu
Functional Dependencies 定义:设R(U)是属性集U上的关系模式。X,Y是U的子集。对于R(U)的 任意一个可能的关系r,如果r中不存在两个元组,它们在X上的属性值相 等,而在Y上的属性值不等,则称“X函数确定Y”或“Y函数依赖于X”,记 作X Y。 X称为这个函数依赖的决定属性集。 n Constraints on the set of legal relations. n Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. n A functional dependency is a generalization of the notion of a key. 学号 姓名 学号 年龄 学号 性别 学号 籍贯 Database System Concepts - 5 th Edition 7. 11 Pany, ccsu
Functional Dependencies (Cont. ) n Let R be a relation schema R and R n The functional dependency holds on(成立) R if and only if for any legal relations r(R), whenever any two tuples t 1 and t 2 of r agree on the attributes , they also agree on the attributes . That is, t 1[ ] = t 2 [ ] n Example: Consider r(A, B ) with the following instance of r. 1 1 3 4 5 7 n On this instance, A B does NOT hold, but B A does hold. Database System Concepts - 5 th Edition 7. 12 Pany, ccsu
Functional Dependencies (Cont. ) n K is a superkey for relation schema R if and only if K R n K is a candidate key for R if and only if l K R, and l for no K, R n Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: bor_loan = (customer_id, loan_number, amount ). We expect this functional dependency to hold: loan_number amount but would not expect the following to hold: amount customer_name Database System Concepts - 5 th Edition 7. 13 Pany, ccsu
Use of Functional Dependencies n We use functional dependencies to: l test relations to see if they are legal under a given set of functional dependencies. 4 l If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. specify constraints on the set of legal relations 4 We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. n Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. l For example, a specific instance of loan may, by chance, satisfy amount customer_name. Database System Concepts - 5 th Edition 7. 14 Pany, ccsu
Functional Dependencies (Cont. ) n A functional dependency is trivial (平凡)if it is satisfied by all instances of a relation l l Example: 4 customer_name, loan_number customer_name 4 customer_name In general, is trivial if Database System Concepts - 5 th Edition 7. 15 Pany, ccsu
Closure of a Set of Functional Dependencies n Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F. l For example: If A B and B C, then we can infer that A C n The set of all functional dependencies logically implied by F is the closure of F(函数依赖F的闭包). n We denote the closure of F by F+. n F+ is a superset of F. Database System Concepts - 5 th Edition 7. 16 Pany, ccsu
Goals of Normalization n Let R be a relation scheme with a set F of functional dependencies. n Decide whether a relation scheme R is in “good” form. n In the case that a relation scheme R is not in “good” form, decompose it into a set of relation scheme {R 1, R 2, . . . , Rn} such that l each relation scheme is in good form l the decomposition is a lossless-join decomposition l Preferably, the decomposition should be dependency preserving. Database System Concepts - 5 th Edition 7. 20 Pany, ccsu
7. 2 First Normal Form n Domain is atomic if its elements are considered to be indivisible units l Examples of non-atomic domains: 4 Set of names, composite attributes 4 Identification numbers like CS 101 that can be broken up into parts n A relational schema R is in first normal form if the domains of all attributes of R are atomic n Non-atomic values complicate storage and encourage redundant (repeated) storage of data l Example: Set of accounts stored with each customer, and set of owners stored with each account l We assume all relations are in first normal form (and revisit this in Chapter 9) Database System Concepts - 5 th Edition 7. 21 Pany, ccsu
First Normal Form (Cont’d) n Atomicity is actually a property of how the elements of the domain are used. l Example: Strings would normally be considered indivisible l Suppose that students are given roll numbers which are strings of the form CS 0012 or EE 1127 l If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. l Doing so is a bad idea: leads to encoding of information in application program rather than in the database. Database System Concepts - 5 th Edition 7. 22 Pany, ccsu
2 NF n 若R 1 NF,且每个非主属性完全函数依赖于关键字,则R 2 NF。 Example: R(S_NO,C_NO,GRADE,TNAME,TAGE,OFFICE); F = { (S_NO, C_NO) GRADE, C_NO TNAME, TNAME TAGE, TNAME OFFICE }; 请问R 2 NF? 分解: SC(S_NO,C_NO,GRADE) CTO(C_NO,TNAME,TAGE,OFFICE) FSC = { (S_NO, C_NO) GRADE } FCTO ={ C_NO TNAME, TNAME TAGE, TNAME OFFICE }; Database System Concepts - 5 th Edition 7. 23 Pany, ccsu
3 NF n 若R 2 NF,且R的每个非主属性都不传递依赖于关键字,则R 3 NF。 Example: SC(S_NO,C_NO,GRADE) CTO(C_NO,TNAME,TAGE,OFFICE) FSC = { (S_NO, C_NO) GRADE } FCTO ={ C_NO TNAME, TNAME TAGE, TNAME OFFICE } 请问SC 3 NF?CTO 3 NF? 分解: SC(S_NO,C_NO,GRADE)FSC = { (S_NO, C_NO) GRADE } CT(C_NO,TNAME)FCT ={ C_NO TNAME } TO(TNAME,TAGE,OFFICE)FTO={TNAME TAGE, TNAME OFFICE} Database System Concepts - 5 th Edition 7. 24 Pany, ccsu
Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form where R and R, at least one of the following holds: n is trivial (i. e. , ) n is a superkey for R Example: schema not in BCNF: bor_loan = ( customer_id, loan_number, amount ) because loan_number amount holds on bor_loan but loan_number is not a superkey Database System Concepts - 5 th Edition 7. 26 Pany, ccsu
BCNF and Dependency Preservation n Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation n If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving. n Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as third normal form. Database System Concepts - 5 th Edition 7. 27 Pany, ccsu
Functional-Dependency Theory n We now consider the formal theory that tells us which functional dependencies are implied logically by a given set of functional dependencies. n We then develop algorithms to generate lossless decompositions into 3 NF(3 NF分解算法) n We then develop algorithms to test if a decomposition is dependency-preserving(保持函数依赖的分解算法) Database System Concepts - 5 th Edition 7. 28 Pany, ccsu
Closure of a Set of Functional Dependencies n Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. l For example: If A B and B C, then we can infer that A C n The set of all functional dependencies logically implied by F is the closure of F. n We denote the closure of F by F+. Database System Concepts - 5 th Edition 7. 29 Pany, ccsu
FD公理及推理规则 l 公理 F 1(自反性, reflexivity ):若X Y,则X Y或X X。 F 2(增广性, augmentation ):若X Y,则XZ YZ或XZ Y。 F 3(传递性, transitivity) ):若X Y,Y Z,则X Z。 l 推理规则 F 5(伪传性, pseudotransitivity):若X Y,YW Z,则XW Z。 F 6(合成性, union):若X Y,X Z,则X YZ。 F 7(分解性, decomposition):若X YZ,则X Y,X Z。 Database System Concepts - 5 th Edition 7. 30 Pany, ccsu
Example n R = (A, B, C, G, H, I) F={ A B A C CG H CG I B H} n some members of F+ l A H 4 by l AG I 4 by l transitivity from A B and B H augmenting A C with G, to get AG CG and then transitivity with CG I CG HI 4 by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity Database System Concepts - 5 th Edition 7. 31 Pany, ccsu
Procedure for Computing F+(选讲) n To compute the closure of a set of functional dependencies F: F+=F repeat for each functional dependency f in F+ apply reflexivity and augmentation rules on f add the resulting functional dependencies to F + for each pair of functional dependencies f 1 and f 2 in F + if f 1 and f 2 can be combined using transitivity then add the resulting functional dependency to F + until F + does not change any further NOTE: We shall see an alternative procedure for this task later Database System Concepts - 5 th Edition 7. 32 Pany, ccsu
Closure of Attribute Sets(重点) n Given a set of attributes , define the closure of under F (denoted by +) as the set of attributes that are functionally determined by under F n Algorithm to compute +, the closure of under F result : = ; while (changes to result) do for each in F do begin if result then result : = result end Database System Concepts - 5 th Edition 7. 33 Pany, ccsu
Example of Attribute Set Closure n R = (A, B, C, G, H, I) n F = {A B A C CG H CG I B H} n (AG)+ 1. 2. 3. 4. result = AG result = ABCG (A C and A B) result = ABCGH (CG H and CG AGBC) result = ABCGHI (CG I and CG AGBCH) Database System Concepts - 5 th Edition 7. 34 Pany, ccsu
Example of Attribute Set Closure n Is AG a candidate key? Is AG a super key? 1. Does AG R? == Is (AG)+ R 2. Is any subset of AG a superkey? 1. Does A R? == Is (A)+ R 2. Does G R? == Is (G)+ R 1. Database System Concepts - 5 th Edition 7. 35 Pany, ccsu
Uses of Attribute Closure There are several uses of the attribute closure algorithm: n Testing for superkey: l To test if is a superkey, we compute +, and check if + contains all attributes of R. n Testing functional dependencies l To check if a functional dependency holds (or, in other words, is in F+), just check if +. l That is, we compute + by using attribute closure, and then check if it contains . l Is a simple and cheap test, and very useful n Computing closure of F l For each R, we find the closure +, and for each S +, we output a functional dependency S. Database System Concepts - 5 th Edition 7. 36 Pany, ccsu
Canonical Cover(最小覆盖、正则覆盖) n Sets of functional dependencies may have redundant dependencies that can be inferred from the others l For example: A C is redundant in: {A B, B C} l Parts of a functional dependency may be redundant on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} 4 E. g. : on LHS: simplified to {A B, B C, AC D} can be {A B, B C, A D} n Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies Database System Concepts - 5 th Edition 7. 37 Pany, ccsu
Canonical Cover n A canonical cover for F is a set of dependencies Fc such that F logically implies all dependencies in Fc, and l Fc logically implies all dependencies in F, and l l No functional dependency in Fc contains an extraneous attribute, and l Each left side of functional dependency in Fc is unique. n To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F 1 1 and 1 2 with 1 1 2 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from until F does not change n Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied Database System Concepts - 5 th Edition 7. 38 Pany, ccsu
Lossless-join Decomposition (分解成简单模式的判别算法) n For the case of R = (R 1, R 2), we require that for all possible relations r on schema R r = R 1 (r ) R 2 (r ) n A decomposition of R into R 1 and R 2 is lossless join if and only if at least one of the following dependencies is in F+: l R 1 R 2 R 1 l R 1 R 2 Database System Concepts - 5 th Edition 7. 39 Pany, ccsu
Example n R = (A, B, C) F = {A B, B C) l Can be decomposed in two different ways n R 1 = (A, B), R 2 = (B, C) l Lossless-join decomposition: R 1 R 2 = {B} and B BC l Dependency preserving n R 1 = (A, B), R 2 = (A, C) ? l Lossless-join decomposition: R 1 R 2 = {A} and A AB l Not dependency preserving (cannot check B C without computing R 1 Database System Concepts - 5 th Edition 7. 40 R 2 ) Pany, ccsu
Dependency Preservation n Let Fi be the set of dependencies F + that include only attributes in Ri. 4 A decomposition is dependency preserving, if (F 1 F 2 … Fn )+ = F + 4 If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts - 5 th Edition 7. 41 Pany, ccsu
Overall Database Design Process n We have assumed schema R is given l R could have been generated when converting E-R diagram to a set of tables. l R could have been a single relation containing all attributes that are of interest (called universal relation). l Normalization breaks R into smaller relations. l R could have been the result of some ad hoc design of relations, which we then test/convert to normal form. Database System Concepts - 5 th Edition 7. 42 Pany, ccsu
ER Model and Normalization n When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization. n However, in a real (imperfect) design, there can be functional dependencies from non-key attributes of an entity to other attributes of the entity l Example: an employee entity with attributes department_number and department_address, and a functional dependency department_number department_address l Good design would have made department an entity n Functional dependencies from non-key attributes of a relationship set possible, but rare --- most relationships are binary Database System Concepts - 5 th Edition 7. 43 Pany, ccsu
Denormalization for Performance n May want to use non-normalized schema for performance n For example, displaying customer_name along with account_number and balance requires join of account with depositor n Alternative 1: Use denormalized relation containing attributes of account as well as depositor with all above attributes l faster lookup l extra space and extra execution time for updates l extra coding work for programmer and possibility of error in extra code n Alternative 2: use a materialized view defined as account l depositor Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors Database System Concepts - 5 th Edition 7. 44 Pany, ccsu
End of Chapter Database System Concepts, 5 th Ed.
Proof of Correctness of 3 NF Decomposition Algorithm Database System Concepts, 5 th Ed.
Correctness of 3 NF Decomposition Algorithm n 3 NF decomposition algorithm is dependency preserving (since there is a relation for every FD in Fc) n Decomposition is lossless l A candidate key (C ) is in one of the relations Ri in decomposition l Closure of candidate key under Fc must contain all attributes in R. l Follow the steps of attribute closure algorithm to show there is only one tuple in the join result for each tuple in Ri Database System Concepts - 5 th Edition 7. 47 Pany, ccsu
Correctness of 3 NF Decomposition Algorithm (Cont’d. ) Claim: if a relation Ri is in the decomposition generated by the above algorithm, then Ri satisfies 3 NF. n Let Ri be generated from the dependency n Let B be any non-trivial functional dependency on Ri. (We need only consider FDs whose right-hand side is a single attribute. ) n Now, B can be in either or but not in both. Consider each case separately. Database System Concepts - 5 th Edition 7. 48 Pany, ccsu
Correctness of 3 NF Decomposition (Cont’d. ) n Case 1: If B in : l If is a superkey, the 2 nd condition of 3 NF is satisfied l Otherwise must contain some attribute not in l Since B is in F+ it must be derivable from Fc, by using attribute closure on . l Attribute closure not have used . If it had been used, must be contained in the attribute closure of , which is not possible, since we assumed is not a superkey. l Now, using ( - {B}) and B, we can derive B (since , and B since B is non-trivial) l Then, B is extraneous in the right-hand side of ; which is not possible since is in Fc. l Thus, if B is in then must be a superkey, and the secondition of 3 NF must be satisfied. Database System Concepts - 5 th Edition 7. 49 Pany, ccsu
Correctness of 3 NF Decomposition (Cont’d. ) n Case 2: B is in . l Since is a candidate key, the third alternative in the definition of 3 NF is trivially satisfied. l In fact, we cannot show that is a superkey. l This shows exactly why the third alternative is present in the definition of 3 NF. Q. E. D. Database System Concepts - 5 th Edition 7. 50 Pany, ccsu
Figure 7. 5: Sample Relation r Database System Concepts - 5 th Edition 7. 51 Pany, ccsu
Figure 7. 6 Database System Concepts - 5 th Edition 7. 52 Pany, ccsu
Figure 7. 7 Database System Concepts - 5 th Edition 7. 53 Pany, ccsu
Figure 7. 15: An Example of Redundancy in a BCNF Relation Database System Concepts - 5 th Edition 7. 54 Pany, ccsu
Figure 7. 16: An Illegal R 2 Relation Database System Concepts - 5 th Edition 7. 55 Pany, ccsu
Figure 7. 18: Relation of Practice Exercise 7. 2 Database System Concepts - 5 th Edition 7. 56 Pany, ccsu
- Slides: 56