Unit 4 Relational Database Design Prof Firoz Sherasiya
Unit – 4 Relational Database Design Prof. Firoz Sherasiya � 9879879861 � firoz. sherasiya@darshan. ac. in Database Management System (2130703) Darshan Institute of Engineering & Technology
Topics to be covered • Functional Dependency • Definition and types of FD • • Closure of FD set Closure of attribute set Irreducible set of FD Normalization and normal forms • • • 1 NF 2 NF 3 NF BCNF 4 NF 5 NF Unit – 4: Relational Database Design 2 Darshan Institute of Engineering & Technology
What is Functional Dependency? § Let R be a relation schema having n attributes A 1, A 2, A 3, …, An. Student Roll. No Name SPI BL 101 Raj 8 0 102 Meet 7 1 § Let attributes X and Y are two subsets of attributes of relation R. § If the values of the X component of a tuple uniquely (or functionally) determine the values of the Y component, then there is a functional dependency from X to Y. § This is denoted by X → Y (i. e Roll. No → Name, SPI, BL). § It is referred as: Y is functionally dependent on the X or X functionally determines Y. Unit – 4: Relational Database Design 3 Darshan Institute of Engineering & Technology
Diagrammatic representation {X 1, X 2} → Y X→Y Y X X 1 X 2 X → {Y 1, Y 2} Y X Y 1 Y 2 § Example § Consider the relation Account(account_no, balance, branch). § account_no can determine balance and branch. § So, there is a functional dependency from account_no to balance and branch. § This can be denoted by account_no → {balance, branch}. account_no Unit – 4: Relational Database Design balance 4 branch Darshan Institute of Engineering & Technology
Types of Functional Dependencies 1. Full Functional Dependency • In a relation, the attribute B is fully functional dependent on A if B is functionally dependent on A, but not on any proper subset of A. • Eg. {Roll_No, Semester, Department_Name} → SPI • We need all three {Roll_No, Semester, Department_Name} to find SPI. 2. Partial Functional Dependency • In a relation, the attribute B is partial functional dependent on A if B is functionally dependent on A as well as on any proper subset of A. • If there is some attribute that can be removed from A and the still dependency holds then it is partial functional dependancy. • Eg. {Enrollment_No, Department_Name} → SPI • Enrollment_No is sufficient to find SPI, Department_Name is not required to find SPI. Unit – 4: Relational Database Design 5 Darshan Institute of Engineering & Technology
Types of Functional Dependencies 3. Transitive Functional Dependency • In a relation, if attribute(s) A → B and B → C, then C is transitively depends on A via B. • Eg. Staff_No → Branch_No ; Branch_No → Branch_Address then Staff_No → Branch_Address 4. Trivial Functional Dependency • X → Y is trivial FD if Y is a subset of X • Eg. {Roll_No, Department_Name} → Roll_No 5. Nontrivial Functional Dependency • X → Y is nontrivial FD if Y is not a subset of X • Eg. {Roll_No, Department_Name} → Student_Name Unit – 4: Relational Database Design 6 Darshan Institute of Engineering & Technology
Armstrong's axioms (inference rules) § Armstrong's axioms are a set of rules used to infer (derive) all the functional dependencies on a relational database. Unit – 4: Relational Database Design 7 Darshan Institute of Engineering & Technology
Armstrong's axioms (inference rules) 1. Reflexivity 5. Self-determination – If B is a subset of A then A → B – A→A 6. Decomposition 2. Augmentation – If A → BC then A → B and A → C – If A → B then AC → BC 7. Union 3. Transitivity – If A → B and A → C then A → BC – If A → B and B → C then A → C 8. Composition 4. Pseudo Transitivity – If A → B and C → D then AC → BD – If A → B and BD → C then AD → C Unit – 4: Relational Database Design 8 Darshan Institute of Engineering & Technology
What is closure of a set of FDs? § Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. § E. g. : F = {A → B and B → C}, then we can infer that A → C § The set of functional dependencies (FDs) that is logically implied by F is called the closure of F. § It is denoted by F+. Unit – 4: Relational Database Design 9 Darshan Institute of Engineering & Technology
Example of closure of a set of FDs § Suppose we are given a relation schema R(A, B, C, G, H, I) and the set of functional dependencies are: • F = (A → B, A → C, CG → H, CG → I, B → H) § The functional dependency A → H is logical implied. We have A→B B→H Unit – 4: Relational Database Design Transitivity rule 10 A→H Darshan Institute of Engineering & Technology
Example of closure of a set of FDs § Suppose we are given a relation schema R(A, B, C, G, H, I) and the set of functional dependencies are: • F = (A → B, A → C, CG → H, CG → I, B → H) § The functional dependency CG → HI is logical implied. We have CG → H CG → I Unit – 4: Relational Database Design Union rule 11 CG → HI Darshan Institute of Engineering & Technology
Example of closure of a set of FDs § Suppose we are given a relation schema R(A, B, C, G, H, I) and the set of functional dependencies are: • F = (A → B, A → C, CG → H, CG → I, B → H) § The functional dependency AG → I is logical implied. We have A→C CG → I Pseudo-transitivity rule Unit – 4: Relational Database Design 12 AG → I Darshan Institute of Engineering & Technology
Example of closure of a set of FDs § Suppose we are given a relation schema R(A, B, C, G, H, I) and the set of functional dependencies are: • F = (A → B, A → C, CG → H, CG → I, B → H) § The functional dependency AG → I is logical implied. We have A→C Augmentation rule AG → CG Transitivity rule AG → I AG → CG CG → I Unit – 4: Relational Database Design 13 Darshan Institute of Engineering & Technology
Example of closure of a set of FDs § Suppose we are given a relation schema R(A, B, C, G, H, I) and the set of functional dependencies are: • F = (A → B, A → C, CG → H, CG → I, B → H) Several members of F+ are F+ = (A → H, CG → HI, AG → I) Unit – 4: Relational Database Design 14 Darshan Institute of Engineering & Technology
Exercise • Compute the closure of the following set F of functional dependencies for relational schema R = (A, B, C, D, E, F): F = (A → B, A → C, CD → E, CD → F, B → E) A → BC A B & A C Union Rule CD → EF CD E & CD F Union Rule A→E A B & B E Transitivity Rule AD → E A C & CD E Pseudo-transitivity Rule AD → F A C & CD F Pseudo-transitivity Rule F+ = {A → BC, CD → EF, A → E, AD → F} Unit – 4: Relational Database Design 15 Darshan Institute of Engineering & Technology
Exercise • Compute the closure of the following set F of functional dependencies for relational schema R = (A, B, C, D, E): F = (AB → C, D → AC, D → E) D → A, D →C D AC Decomposition Rule D → ACE D AC & D E Union Rule F+ = {D → A, D → C, D → ACE} Unit – 4: Relational Database Design 16 Darshan Institute of Engineering & Technology
What is a closure of attribute sets? § Given a set of attributes α, the closure of α under F is the set of attributes that are functionally determined by α under F. § It is denoted by α+. Unit – 4: Relational Database Design 17 Darshan Institute of Engineering & Technology
Algorithm to find closure of attribute sets § Algorithm to compute α+, the closure of α under F § Steps 1. result = α 2. while (changes to result) do – for each β → γ in F do » begin • if β ⊆ result then result = result U γ • else result = result » end Unit – 4: Relational Database Design 18 Darshan Institute of Engineering & Technology
Example of closure of attribute sets § Consider the following relation schema § R = (A, B, C, G, H, I). § For this relation, a set of functional dependencies F can be given as F = {A B, A C, CG H, CG I, B H} § Find out the closure of (AG)+. Unit – 4: Relational Database Design 19 Darshan Institute of Engineering & Technology
Example of closure of attribute sets (Solution) § F = {A B, A C, CG H, CG I, B H}. Find out (AG)+ § Step-1 : • result = {AG} § Step-2 : for each β → γ in F do • if β ⊆ result then result = result U γ • else result = ABG A B A ⊆ AG result = ABCG A C A ⊆ ABG result = ABCGH CG ⊆ ABCG result = ABCGHI CG ⊆ ABCGH result = ABCGHI B H B ⊆ ABCGHI Unit – 4: Relational Database Design 20 Darshan Institute of Engineering & Technology
Exercise • Given functional dependencies (FDs) for relational schema R = (A, B, C, D, E): F = {A BC, CD E, B D, E A} 1. 2. 3. 4. 5. Find Closure for A Find Closure for CD Find Closure for BC Find Closure for E A+ = ABCDE CD+ = ABCDE B+ = BD BC+ = ABCDE E+ = ABCDE Unit – 4: Relational Database Design 21 Darshan Institute of Engineering & Technology
What is extraneous attributes? § Let us consider a relation R with schema R = (A, B, C) and set of functional dependencies F = { AB → C, A → C }. § In AB → C, B is extraneous attribute. The reason is, there is another FD A → C, which means when A alone can determine C, the use of B is unnecessary (extra). § An attribute of a functional dependency is said to be extraneous if we can remove it without changing the closure of the set of functional dependencies. Unit – 4: Relational Database Design 22 Darshan Institute of Engineering & Technology
What is canonical cover? § A canonical cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies. § It is denoted by Fc § A canonical cover for F is a set of dependencies Fc such that 1. F logically implies all dependencies in Fc and 2. Fc logically implies all dependencies in F and 3. No functional dependency in Fc contains an extraneous attribute and 4. Each left side of functional dependency in Fc is unique. Decomposition Rule Unit – 4: Relational Database Design F = {A → B, A → C} FC = {A → BC} 23 Union Rule Darshan Institute of Engineering & Technology
Algorithm to find canonical cover § repeat Use the union rule to replace any dependencies in F α 1 β 1 and α 1 β 2 with α 1 β 1 β 2 Find a functional dependency α β with an extraneous attribute either in α or in β /* Note: test for extraneous attributes done using Fc, not F*/ If an extraneous attribute is found, delete it from α β until F does not change § Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied Unit – 4: Relational Database Design 24 Darshan Institute of Engineering & Technology
Example to find canonical cover § Consider the relation schema R = (A, B, C) with FDs F = {A → BC, B → C, A → B, AB → C} find canonical cover. § Combine A → BC and A → B into A → BC (Union Rule) • Set is now {A → BC, B → C, AB → C} § A is extraneous in AB → C • Check if the result of deleting A from AB → C is implied by the other dependencies • Yes: in fact, B → C is already present • Set is now {A → BC, B → C} § C is extraneous in A → BC • Check if A → C is logically implied by A → B and the other dependencies • Yes: using transitivity on A → B and B → C. § The canonical cover is: A → B, B → C Unit – 4: Relational Database Design 25 Darshan Institute of Engineering & Technology
What is decomposition? § Decomposition is the process of breaking down given relation into two or more relations. § Relation R is replaced by two or more relations in such a way that: 1. Each new relation contains a subset of the attributes of R 2. Together, they all include all tuples and attributes of R § Types of decomposition 1. Lossy decomposition 2. Lossless decomposition (non-loss decomposition) Unit – 4: Relational Database Design 26 Darshan Institute of Engineering & Technology
Lossy decomposition Ano Balance A 01 5000 A 02 5000 Ano Balance Bname A 01 5000 Rajkot A 02 5000 Surat Not Same Ano Balance Bname A 01 5000 Rajkot A 01 5000 Surat A 02 5000 Rajkot A 02 5000 Surat Unit – 4: Relational Database Design 27 Balance Bname 5000 Rajkot 5000 Surat Extra records Darshan Institute of Engineering & Technology
Lossy decomposition § The decomposition of relation R into R 1 and R 2 is lossy when the join of R 1 and R 2 does not yield the same relation as in R. § This is also referred as lossy-join decomposition. § The disadvantage of such kind of decomposition is that some information is lost during retrieval of original relation. § From practical point of view, decomposition should not be lossy decomposition. Unit – 4: Relational Database Design 28 Darshan Institute of Engineering & Technology
Lossless decomposition Ano Balance A 01 5000 A 02 5000 Ano Balance Bname A 01 5000 Rajkot A 02 5000 Surat Same Ano Balance Bname A 01 5000 Rajkot A 02 5000 Surat Unit – 4: Relational Database Design 29 Ano Bname A 01 Rajkot A 02 Surat Same as original table No extra records Darshan Institute of Engineering & Technology
Lossless decomposition § The decomposition of relation R into R 1 and R 2 is lossless when the join of R 1 and R 2 produces the same relation as in R. § This is also referred as a non-additive (non-loss) decomposition. § All decompositions must be lossless. Unit – 4: Relational Database Design 30 Darshan Institute of Engineering & Technology
What is an anomaly in database design? § Anomalies are problems that can occur in poorly planned, unnormalized database where all the data are stored in one table. § There are three types of anomalies that can arise in the database because of redundancy are 1. Insert anomaly 2. Delete anomaly 3. Update / Modification anomaly Unit – 4: Relational Database Design 31 Darshan Institute of Engineering & Technology
Insert anomaly § Consider a relation • emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key Want to insert new department detail (IT) E# Ename Address D# Dname Dmgr# 1 Rajkot 1 CE 1 2 Meet Surat 1 CE 1 § Suppose a new department (IT) has been started by the organization but initially there is no employee appointed for that department. § We want to insert that department detail in emp_dept table. § But the tuple for this department cannot be inserted into this table as the E# will have NULL value, which is not allowed because E# is primary key. § This kind of problem in the relation where some tuple cannot be inserted is known as insert anomaly. Unit – 4: Relational Database Design 32 Darshan Institute of Engineering & Technology
What is Insert anomaly? § An insert anomaly occurs when certain attributes cannot be inserted into the database without the presence of another attribute. Unit – 4: Relational Database Design 33 Darshan Institute of Engineering & Technology
Delete anomaly § Consider a relation • emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key Want to delete Meet employee's detail E# Ename Address D# Dname Dmgr# 1 Rajkot 1 CE 1 2 Meet Surat 1 IT 2 § Now consider there is only one employee in some department (IT) and that employee leaves the organization. § So we need to delete tuple of that employee (Meet). § But in addition to that information about the department also deleted. § This kind of problem in the relation where deletion of some tuples can lead to loss of some other data not intended to be removed is known as delete anomaly. Unit – 4: Relational Database Design 34 Darshan Institute of Engineering & Technology
What is Delete anomaly? § A delete anomaly exists when certain attributes are lost because of the deletion of another attribute. Unit – 4: Relational Database Design 35 Darshan Institute of Engineering & Technology
Update anomaly § Consider a relation • emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) E# as a primary key Want to update CE department’s manager E# Ename Address D# Dname Dmgr# 1 Rajkot 1 CE M 1 2 Meet Surat 2 IT M 2 3 Jay Rajkot 2 C. E M 2 § Suppose the manager of a (CE) department has changed, this requires that the Dmgr# in all the tuples corresponding to that department must be changed to reflect the new status. § If we fail to update all the tuples of given department, then two different records of employee working in the same department might show different Dmgr# lead to inconsistency in the database. Unit – 4: Relational Database Design 36 Darshan Institute of Engineering & Technology
What is Update anomaly? § An update anomaly exists when one or more records (instance) of duplicated data is updated, but not all. Unit – 4: Relational Database Design 37 Darshan Institute of Engineering & Technology
Anomaly (Summary) Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 C. E. Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 Computer Patel E 4 Deepak Surat D 1 C. E Patel E 5 Suresh Surat D 3 Electrical Joshi null D 4 Chemical null Delete Anomaly If we delete Employee having ID “E 2” then Civil department will also delete because there is only one record of Civil dept. Insert Anomaly Do not allow to insert new Department “Chemical” until an employee is assign to it. Update Anomaly An update anomaly exists when one or more records of duplicated data is updated, but not all. Unit – 4: Relational Database Design 38 Darshan Institute of Engineering & Technology
How to deal with insert anomaly Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer C. E. Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 Computer Patel E 4 Deepak Surat D 1 C. E Computer Patel E 5 Suresh Surat D 3 Electrical Joshi null D 4 Chemical null Do not allow to insert new department “Chemical” until an employee is assign to it. Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 D 3 Electrical Joshi E 4 Deepak Surat D 1 D 4 Chemical null E 5 Suresh Surat D 3 Unit – 4: Relational Database Design 39 Darshan Institute of Engineering & Technology
How to deal with delete anomaly Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer C. E. Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 Computer Patel E 4 Deepak Surat D 1 C. E Computer Patel E 5 Suresh Surat D 3 Electrical Joshi If we delete Employee having ID “E 2” then Civil department will also delete because there is only one record of Civil dept. Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 D 3 Electrical Joshi E 4 Deepak Surat D 1 E 5 Suresh Surat D 3 Unit – 4: Relational Database Design 40 Darshan Institute of Engineering & Technology
How to deal with update anomaly Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer C. E. Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 Computer Patel E 4 Deepak Surat D 1 C. E Computer Patel E 5 Suresh Surat D 3 Electrical Joshi Changing the name of department D 1 from “Computer” to “IT” may update one or more records, but not all. Emp. ID Emp. Name Address Dept. ID Dept. Name Dept. Mngr E 1 Rajkot D 1 Computer Patel E 2 Samir Rajkot D 2 Civil Shah E 3 Meet Baroda D 1 D 3 Electrical Joshi E 4 Deepak Surat D 1 E 5 Suresh Surat D 3 Unit – 4: Relational Database Design 41 Darshan Institute of Engineering & Technology
How anomalies in database design can be solved? § Such type of anomalies in database design can be solved by using normalization. Unit – 4: Relational Database Design 42 Darshan Institute of Engineering & Technology
What is normalization? § Normalization is the process of removing redundant data from tables to improve data integrity, scalability and storage efficiency. 1. data integrity (completeness, accuracy and consistency of data) 2. scalability (ability of a system to continue to function well in a growing amount of work) 3. storage efficiency (ability to store and manage data that consumes the least amount of space) Unit – 4: Relational Database Design 43 Darshan Institute of Engineering & Technology
What we do in normalization? § Normalization generally involves splitting an existing table into multiple (more than one) tables, which can be re-joined or linked each time a query is issued (executed). Unit – 4: Relational Database Design 44 Darshan Institute of Engineering & Technology
How many normal forms are there? § Normal forms: 1. 1 NF (First normal form) 2. 2 NF (Second normal form) 3. 3 NF (Third normal form) 4. BCNF (Boyce–Codd normal form) 5. 4 NF (Forth normal form) 6. 5 NF (Fifth normal form) As we move from 1 NF to 5 NF number of tables and complexity increases but redundancy decreases. Unit – 4: Relational Database Design 45 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) § Conditions for 1 NF Each cells of a table should contain a single value. § A relation R is in first normal form (1 NF) if and only if it does not contain any composite attribute or multi-valued attributes or their combinations. OR § A relation R is in first normal form (1 NF) if and only if all underlying domains contain atomic values only. Unit – 4: Relational Database Design 46 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Composite attribute] Customer. ID Name Address C 01 Raj Jamnagar Road, Rajkot C 02 Meet Nehru Road, Jamnagar § In above relation address is composite attribute which is further divided into sub-attributes as “Road” and “City”. § So above relation is not in 1 NF. Unit – 4: Relational Database Design 47 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Composite attribute] Customer. ID Name Address C 01 Raj Jamnagar Road, Rajkot C 02 Meet Nehru Road, Jamnagar § Problem: It is difficult to retrieve the list of customers living in ‘Jamnagar’ from above table. § The reason is that address attribute is composite attribute which contains road name as well as city name in single cell. § It is possible that city name word is also there in road name. § In our example, ‘Jamnagar’ word occurs in both records, in first record it is a part of road name and in second one it is the name of city. Unit – 4: Relational Database Design 48 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Composite attribute] Customer. ID Name Address C 01 Raj Jamnagar Road, Rajkot C 02 Meet Nehru Road, Jamnagar § Solution: Divide composite attributes into number of subattributes and insert value in proper sub-attribute. Customer. ID Name Road City C 01 Raj Jamnagar Road Rajkot C 02 Meet Nehru Road Jamnagar Unit – 4: Relational Database Design 49 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Multivalued attribute] Student Roll. No Name Failedin. Subjects 101 Raj DS, DBMS 102 Meet DBMS, DS 103 Jeet DS, DBMS, DE 104 Harsh DBMS, DE, DS 105 Nayan DE, DBMS, DS § In above relation Failedin. Subjects is a multi-valued attribute which can store more than one values. § So above relation is not in 1 NF. Unit – 4: Relational Database Design 50 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Multivalued attribute] Student Roll. No Name Failedin. Subjects 101 Raj DS, DBMS 102 Meet DBMS, DS 103 Jeet DS, DBMS, DE 104 Harsh DBMS, DE, DS 105 Nayan DE, DBMS, DS § Problem: It is difficult to retrieve the list of students failed in ‘DBMS’ as well as ‘DS’ but not in other subjects from above table. § The reason is that Failedin. Subjects attribute is multi-valued attribute so it contains more than one value. Unit – 4: Relational Database Design 51 Darshan Institute of Engineering & Technology
1 NF (First Normal Form) [Multivalued attribute] Student Result Roll. No Name Failedin. Subjects Roll. No Name RID Roll. No Subject 101 Raj DS, DBMS 101 Raj 1 101 DS 102 Meet DBMS, DS 102 Meet 2 101 DBMS 103 Jeet DS, DBMS, DE 103 Jeet 3 102 DBMS 104 Harsh DBMS, DE, DS 104 Harsh 4 102 DS 105 Nayan DE, DBMS, DS 105 Nayan 5 103 DS … … … § Solution: Split the table into two tables in such as way that • the first table contains all attributes except multi-valued attribute with same primary key and • other table contains multi-valued attribute and place a primary key in it. • insert the primary key of first table in the second table as a foreign key. Unit – 4: Relational Database Design 52 Darshan Institute of Engineering & Technology
2 NF (Second Normal Form) § Conditions for 2 NF It is in 1 NF and each table should contain a single primary key. § A relation R is in second normal form (2 NF) • if and only if it is in 1 NF and • every non-primary key attribute is fully dependent on the primary key OR § A relation R is in second normal form (2 NF) • if and only if it is in 1 NF and • no any non-primary key attribute is partially dependent on the primary key Unit – 4: Relational Database Design 53 Darshan Institute of Engineering & Technology
2 NF (Second Normal Form) Customer. ID Account. NO Accesss. Date Balance Branch. Name C 01 A 01 01 -01 -2017 50000 Rajkot C 02 A 01 01 -03 -2017 50000 Rajkot C 01 A 02 01 -05 -2017 25000 Surat C 03 A 02 01 -07 -2017 25000 Surat Customer. ID Account. NO Accesss. Date Balance Branch. Name § FD 1: {Customer. ID, Account. NO} → {Accesss. Date, Balance, Branch. Name} § FD 2: Account. NO → {Balance, Branch. Name} § Balance and Branch. Name are partial dependent on primary key. So above relation is not in 2 NF. Unit – 4: Relational Database Design 54 Darshan Institute of Engineering & Technology
2 NF (Second Normal Form) Customer. ID Account. NO Accesss. Date Balance Branch. Name C 01 A 01 01 -01 -2017 50000 Rajkot C 02 A 01 01 -03 -2017 50000 Rajkot C 01 A 02 01 -05 -2017 25000 Surat C 03 A 02 01 -07 -2017 25000 Surat § Problem: For example, in case of a joint account multiple customers have common (one) accounts. § If an account ‘A 01’ is operated jointly by two customers says ‘C 01’ and ‘C 02’ then data values for attributes Balance and Branch. Name will be duplicated in two different tuples of customers ‘C 01’ and ‘C 02’. Unit – 4: Relational Database Design 55 Darshan Institute of Engineering & Technology
2 NF (Second Normal Form) Solution: Decompose relation in such a way that resultant relations do not have any partial FD. Remove partial dependent attributes from the relation that violets 2 NF. Cst. ID Act. NO Access. Date Balance Branch C 01 A 01 01 -01 -2017 50000 Rajkot C 02 A 01 01 -03 -2017 50000 Rajkot C 01 A 02 01 -05 -2017 25000 Surat C 03 A 02 01 -07 -2017 25000 Surat Cst. ID Balance Branch Place them in separate relation along with the prime attribute on which they are fully dependent. Table 1 A 02 50000 Rajkot 25000 Surat Cst. ID Act. NO Access. Date The primary key of new relation will be the attribute on which it is fully dependent. C 01 Table 2 C 02 A 01 01 -01 -2017 A 01 01 -03 -2017 Keep other attributes same as in that table with the same primary key. C 01 A 02 01 -05 -2017 C 03 A 02 01 -07 -2017 Unit – 4: Relational Database Design 56 Darshan Institute of Engineering & Technology
3 NF (Third Normal Form) § Conditions for 3 NF It is in 2 NF and there is no transitive dependency. (Transitive dependency? ? ? ) A → B & B → C then A → C. § A relation R is in third normal form (3 NF) • if and only if it is in 2 NF and • every non-key attribute is non-transitively dependent on the primary key OR § A relation R is in third normal form (3 NF) • if and only if it is in 2 NF and • no any non-key attribute is transitively dependent on the primary key Unit – 4: Relational Database Design 57 Darshan Institute of Engineering & Technology
3 NF (Third Normal Form) Account. NO Balance Branch. Name Branch. Address A 01 50000 Rajkot Kalawad Road A 02 40000 Rajkot Kalawad Road A 03 35000 Rajkot Kalawad Road A 04 25000 Rajkot Kalawad Road Account. NO Balance Branch. Name Branch. Address § FD 1: Account. NO → {Balance, Branch. Name, Branch. Address} and § FD 2: Branch. Name → Branch. Address § So Account. NO → Branch. Address (Using Transitivity rule) § Problem: In this relation, branch address will be stored repeatedly for each account of the same branch which occupies more space. Unit – 4: Relational Database Design 58 Darshan Institute of Engineering & Technology
3 NF (Third Normal Form) Solution: Decompose relation in such a way that resultant relations do not have any transitive FD. Remove transitive dependent attributes from the relation that violets 3 NF. Place them in a new relation along with the non-prime attributes due to which transitive dependency occurred. The primary key of the new relation will be non-prime attributes due to which transitive dependency occurred. Keep other attributes same as in the table with same primary key and add prime attributes of other relation into it as a foreign key. Unit – 4: Relational Database Design 59 ANO Balance BName BAddress A 01 50000 Rajkot Kalawad Road A 02 40000 Rajkot Kalawad Road A 03 35000 Rajkot Kalawad Road A 04 25000 Rajkot Kalawad Road BName Table 1 Rajkot Table 2 BAddress Kalawad Road Forei gn Ke y ANO Balance BName A 01 50000 Rajkot A 02 40000 Rajkot A 03 35000 Rajkot A 04 25000 Rajkot Darshan Institute of Engineering & Technology
BCNF (Boyce-Codd Normal Form) § Conditions for BCNF is based on the concept of a determinant. It is in 3 NF and every determinant should be primary key. Primary Key Determinant Dependent Account. NO → {Balance, Branch} Unit – 4: Relational Database Design 60 Darshan Institute of Engineering & Technology
BCNF (Boyce-Codd Normal Form) § A relation R is in Boyce-Codd normal form (BCNF) • if and only if it is in 3 NF and • for every functional dependency X → Y, X should be the primary key of the table. OR § A relation R is in Boyce-Codd normal form (BCNF) • if and only if it is in 3 NF and • every prime key attribute is non-transitively dependent on the primary key OR § A relation R is in Boyce-Codd normal form (BCNF) • if and only if it is in 3 NF and • no any prime key attribute is transitively dependent on the primary key Unit – 4: Relational Database Design 61 Darshan Institute of Engineering & Technology
BCNF (Boyce-Codd Normal Form) § FD 1: Student, Language→ Faculty Student Language Faculty § FD 2: Faculty → Language Mita JAVA Patel Nita VB Shah Sita JAVA Jadeja Gita VB Dave Rita VB Shah Nita JAVA Patel Mita VB Dave Rita JAVA Jadeja § So {Student, Language} → Language (Using Transitivity rule) Here, one faculty teaches only one subject, but a subject may be taught by more than one faculty. In FD 2, determinant is Guide which is not a primary key. So table is not in BCNF. § Problem: In this relation one student has more than one project with different guide then records will be stored repeatedly for each student, language and guide combination which occupies more space. Unit – 4: Relational Database Design 62 Darshan Institute of Engineering & Technology
BCNF (Boyce-Codd Normal Form) Solution: Decompose relation in such a way that resultant relations do not have any transitive FD. Remove transitive dependent prime attribute from relation that violets BCNF. Place them in separate new relation along with the non-prime attribute due to which transitive dependency occurred. The primary key of new relation will be this non-prime attribute due to which transitive dependency occurred. Keep other attributes same as in that table with same primary key and add a prime attribute of other relation into it as a foreign key. Unit – 4: Relational Database Design 63 Student Language Faculty Table 2 Table 1 Faculty Language Student Faculty Patel JAVA Mita Patel Shah VB Nita Shah Jadeja JAVA Sita Jadeja Dave VB Gita Dave Rita Shah Nita Patel Mita Dave Rita Jadeja Darshan Institute of Engineering & Technology
Multivalued dependency (MVD) § For a dependency X → Y, if for a single value of X, multiple values of Y exists, then the table may have multi-valued dependency. X Y student. ID subject activity 101 DS Cricket 101 DBMS Cricket 101 DS Football 101 DBMS Football Unit – 4: Relational Database Design Multivalued dependency (MVD) is denoted by →→ Multivalued dependency (MVD) is represented as X →→ Y 64 Darshan Institute of Engineering & Technology
4 NF (Fourth Normal Form) § A relation R is in fourth normal form (4 NF) • if and only if it is in BCNF and • has no multivalued dependencies Unit – 4: Relational Database Design 65 Darshan Institute of Engineering & Technology
4 NF (Fourth Normal Form) student. ID subject activity 101 DS Cricket 101 DBMS Cricket 101 DS Football 101 DBMS Football Decompose student. ID subject student. ID activity 101 DS 101 Cricket 101 DBMS 101 Football Unit – 4: Relational Database Design 66 Darshan Institute of Engineering & Technology
Functional dependency & Multivalued dependency § A table can have both functional dependency as well as multivalued dependency to gather. • student. ID → address • student. ID →→ subject • student. ID →→ activity student. ID address subject activity 101 C. G. Road DS Cricket 101 C. G. Road DBMS Cricket 101 C. G. Road DS Football 101 C. G. Road DBMS Football Unit – 4: Relational Database Design 67 Darshan Institute of Engineering & Technology
Functional dependency & Multivalued dependency student. ID address subject activity 101 C. G. Road DS Cricket 101 C. G. Road DBMS Cricket 101 C. G. Road DS Football 101 C. G. Road DBMS Football Decompose student. ID subject student. ID address student. ID activity 101 DS 101 C. G. Road 101 Cricket 101 DBMS 101 Football Unit – 4: Relational Database Design 68 Darshan Institute of Engineering & Technology
5 NF (Fifth Normal Form) § A relation R is in fifth normal form (5 NF) • if and only if it is in 4 NF and • it cannot have a lossless decomposition in to any number of smaller tables (relations). Unit – 4: Relational Database Design 69 Darshan Institute of Engineering & Technology
5 NF (Fifth Normal Form) Result. ID Roll. No Student. Name Subject. Name Result 1 101 Raj DBMS Pass 2 101 Raj DS Pass 3 101 Raj DE Pass 4 102 Meet DBMS Pass 5 102 Meet DS Fail 6 102 Meet DE Pass 7 103 Suresh DBMS Fail 8 103 Suresh DS Pass 9 103 Suresh DE Fail Above relation is further decompose into sub-relations. So above relation is not in 5 NF. Unit – 4: Relational Database Design 70 Darshan Institute of Engineering & Technology
5 NF (Fifth Normal Form) Student Result Roll. No Student. Name Result. ID Roll. No Subject. ID Result 101 Raj 1 101 1 Pass 102 Meet 2 101 2 Pass 103 Suresh 3 101 3 Pass 4 102 1 Pass 5 102 2 Fail Subject. ID Subject. Name 6 102 3 Pass 1 DBMS 7 103 1 Fail 2 DS 8 103 2 Pass 3 DE 9 103 3 Fail None of the above relations can be further decomposed into sub-relations. So above database is in 5 NF. Unit – 4: Relational Database Design 71 Darshan Institute of Engineering & Technology
How to find key? § Conditions to find key 1. The attribute is a part of key, if it does not occur on any side of FD 2. The attribute is a part of key, if it occurs on the left-hand side of an FD, but never occurs on the right-hand side 3. The attribute is not a part of key, if it occurs on the right-hand side of an FD, but never occurs on the left-hand side 4. The attribute may be a part of key or not, if it occurs on the both side of an FD Unit – 4: Relational Database Design 72 Darshan Institute of Engineering & Technology
Example to find (candidate) key § Let a relation R with attributes ABCD with FDs C → A, B → C. Find keys for relation R. 1. attribute not occur on any side of FDs (D) 2. attribute occurs on only left-hand side of an FDs (B) 3. attribute occurs on only right-hand side of an FDs (A) 4. attribute occurs on both the sides of an FDs (C) § The core is BD. § B determines C and C determines A, So using transitivity rule B determine A also. § So BD is a key. Unit – 4: Relational Database Design 73 Darshan Institute of Engineering & Technology
Exercise to find (candidate) key § Let a relation R with attributes ABCD with FDs C → D, C → A and B → C. Find keys for relation R. • The core is B. B determines C which determines A and D, so B is a key. Therefore B is the only key. § Let a relation R with attributes ABCD with FDs B → C, D → A. Find keys for relation R. • The core is BD. B determines C and D determines A, so BD is a key. Therefore BD is the only key. § Let a relation R with attributes ABCD with FDs A → B, BC → D and A → C. Find keys for relation R. • The core is A. A determines B and C which determine D, so A is a key. Therefore A is the only key. Unit – 4: Relational Database Design 74 Darshan Institute of Engineering & Technology
Find (candidate) key & check for normal forms § Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, do the following: 1. Identify the candidate key(s) for R. 2. Identify the best normal form that R satisfies (1 NF, 2 NF, 3 NF or BCNF). B → C, D → A Candidate Key is BD Relation R is in 1 NF but not 2 NF. In above FDs there is partial dependency (C is depend on only B but Key is BD) (A is depend on only D but Key is BD) Unit – 4: Relational Database Design 75 Darshan Institute of Engineering & Technology
Find (candidate) key & check for normal forms § Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, do the following: 1. Identify the candidate key(s) for R. 2. Identify the best normal form that R satisfies (1 NF, 2 NF, 3 NF or BCNF). C → D, C → A, B → C Candidate Key is B Relation R is in 2 NF but not 3 NF. In above FDs there is transitive dependency (B → C & C → D so B → D using transitivity rule) or (B → C & C → A so B → A using transitivity rule) Unit – 4: Relational Database Design 76 Darshan Institute of Engineering & Technology
Find (candidate) key & check for normal forms § Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, do the following: 1. Identify the candidate key(s) for R. 2. Identify the best normal form that R satisfies (1 NF, 2 NF, 3 NF or BCNF). A → B, BC → D, A → C Candidate Key is A Relation R is in 2 NF but not 3 NF. In above FDs there is transitive dependency (A → B & A → C so A → BC using union rule) and (A → BC & BC → D so A → D using transitivity rule) Unit – 4: Relational Database Design 77 Darshan Institute of Engineering & Technology
Find (candidate) key & check for normal forms § Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, do the following: 1. Identify the candidate key(s) for R. 2. Identify the best normal form that R satisfies (1 NF, 2 NF, 3 NF or BCNF). ABC → D, D → A Candidate Key are ABC, BCD Relation R is in 3 NF but not BCNF. In above FDs both FDs have prime attribute in dependent (right) side D and A Unit – 4: Relational Database Design 78 Darshan Institute of Engineering & Technology
How to normalize database? § A software contract and consultancy firm maintains details of all the various projects in which its employees are currently involved. These details comprise: Employee Number, Employee Name, Date of Birth, Department Code, Department Name, Project Code, Project Description, Project Supervisor Assume the following: I. III. IV. V. VI. Each employee number is unique. Each department has a single department code. Each project has a single code and supervisor. Each employee may work on one or more projects. Employee names need not necessarily be unique. Project Code, Project Description and Project Supervisor are repeating fields. VII. Normalize this data to Third Normal Form. Unit – 4: Relational Database Design 79 Darshan Institute of Engineering & Technology
How to normalize database? [UNF] § A software contract and consultancy firm maintains details of all the various projects in which its employees are currently involved. These details comprise: Employee Number, Employee Name, Date of Birth, Department Code, Department Name, Project Code, Project Description, Project Supervisor UNF Employee Number Employee Name Date of Birth Department Code Department Name Project Code Project Description Project Supervisor 1 Raj 1 -1 -85 1 CE 1 IOT Patel 2 Meet 4 -4 -86 2 EC 2 PHP Shah 3 Suresh 2 -2 -85 1 CE 3 IOT Patel 1 Raj 1 -1 -85 1 CE 2 PHP Shah Unit – 4: Relational Database Design 80 Darshan Institute of Engineering & Technology
How to normalize database? [1 NF] UNF Employee Number Employee Name Date of Birth Department Code Department Name Project Code Project Description Project Supervisor 1 Raj 1 -1 -85 1 CE 1 IOT Patel 2 Meet 4 -4 -86 2 EC 2 PHP Shah 3 Suresh 2 -2 -85 1 CE 1 IOT Patel 1 Raj 1 -1 -85 1 CE 2 PHP Shah 1 NF Employee Number Employee Name Date of Birth Department Code Department Name Employee Number Project Code Project Description Project Supervisor 1 Raj 1 -1 -85 1 CE 1 1 IOT Patel 2 Meet 4 -4 -86 2 EC 2 2 PHP Shah 3 Suresh 2 -2 -85 1 CE 3 1 IOT Patel 1 2 PHP Shah Unit – 4: Relational Database Design 81 Darshan Institute of Engineering & Technology
How to normalize database? [1 NF] 2 NF Employee Number Employee Name Date of Birth Department Code Department Name Project Code Project Description Project Supervisor 1 Raj 1 -1 -85 1 CE 1 IOT Patel 2 Meet 4 -4 -86 2 EC 2 PHP Shah 3 Suresh 2 -2 -85 1 CE Unit – 4: Relational Database Design 82 Employee Number Project Code 1 1 2 2 3 1 1 2 Darshan Institute of Engineering & Technology
How to normalize database? [1 NF] 3 NF Employee Number Employee Name Date of Birth Department Code Project Description Project Supervisor 1 Raj 1 -1 -85 1 1 IOT Patel 2 Meet 4 -4 -86 2 2 PHP Shah 3 Suresh 2 -2 -85 1 Employee Number Project Code 1 1 Department Code Department Name 1 CE 2 2 2 EC 3 1 1 2 Unit – 4: Relational Database Design 83 Darshan Institute of Engineering & Technology
Questions asked in GTU 1. What is meant by normalization? Write its need. List and discuss various normalization forms. 2. Consider schema EMPLOYEE(E-ID, E-NAME, E-CITY, E-STATE) and FD = {E-ID → E-NAME, E-ID → E-CITY, E-ID → E-STATE, E-CITY → E-STATE} • Find attribute closure for: (E-ID)+ 3. Compute the closure of the following set F of functional dependencies for relation schema R ( A, B, C, D, E). A → BC, CD → E, B → D, E → A • List the candidate keys for R. 4. Consider schema R = (A, B, C, G, H, I) and the set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}. ( Use F+ ) • Prove that AG → I Holds. Unit – 4: Relational Database Design 84 Darshan Institute of Engineering & Technology
Questions asked in GTU 5. In the BCNF decomposition algorithm, suppose you use a functional dependency α → β to decompose a relation schema r ( α , β , γ ) into r 1 (α , β ) and r 2 ( α , γ ). • What primary and foreign-key constraint do you expect to hold on the decomposed relations? • Give an example of an inconsistency that can arise due to an erroneous update, if the foreign-key constraint were not enforced on the decomposed relations above. • When a relation is decomposed into 3 NF, what primary and foreign key dependencies would you expect will hold on the decomposed schema? Unit – 4: Relational Database Design 85 Darshan Institute of Engineering & Technology
Questions asked in GTU 6. A college maintains details of its lecturers' subject area skills. These details comprise: Lecturer Number, Lecturer Name, Lecturer Grade, Department Code, Department Name, Subject Code, Subject Name, Subject Level. Assume that each lecturer may teach many subjects but may not belong to more than one department. Subject Code, Subject Name and Subject Level are repeating fields. Normalize this data to Third Normal Form. Unit – 4: Relational Database Design 86 Darshan Institute of Engineering & Technology
- Slides: 86