NORMALIZATION CHAPTER OBJECTIVES The purpose of normailization Data
NORMALIZATION
CHAPTER OBJECTIVES The purpose of normailization Data redundancy and Update Anomalies Functional Dependencies The Process of Normalization First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF)
CHAPTER OBJECTIVES General Definition of Second and Third Normal Form Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4 NF) Fifth Normal Form (5 NF)
WHAT IS NORMALIZATION? In relational database theory Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently and reduce repeating data and to reduce the potential for anomalies during data operations. Data normalization also may improve data consistency and simplify future extension of the logical data model. The formal classifications used for describing a relational database's level of normalization are called normal forms Normal Forms is abbreviated as NF
UPDATE ANOMALIES Relations that have redundant data may have problems called update anomalies, which are classified as , Insertion anomalies Deletion anomalies Modification anomalies
WHAT HAPPENS WITHOUT NORMALIZATION A non-normalized database can suffer from data anomalies: A non-normalized database may store data representing a particular referent in multiple locations. An update to such data in some but not all of those locations results in an update anomaly, yielding inconsistent data. A normalized database prevents such an anomaly by storing such data (i. e. data other than primary keys) in only one location. A non-normalized database may have inappropriate dependencies, i. e. relationships between data with no functional dependencies. Adding data to such a database may require first adding the unrelated dependency. A normalized database prevents such insertion anomalies by ensuring that database relations mirror functional dependencies. Similarly, such dependencies in non-normalized databases can hinder deletion. That is, deleting data from such databases may require deleting data from the inappropriate dependency. A normalized database prevents such deletion anomalies by ensuring that all records are uniquely identifiable and contain no extraneous information.
NORMALIZED DATABASES Normalized databases have a design that reflects the true dependencies between tracked quantities, allowing quick updates to data with little risk of introducing inconsistencies. Instead of attempting to lump all information into one table, data is spread out logically into many tables. Normalizing the data is decomposing a single relation into a set of smaller relations which satisfy the constraints of the original relation. Redundancy can be solved by decomposing the tables. However certain new problems are caused by decomposition. Normalization helps us to make a conscious decision to avoid redundancy keeping the pros and cons in mind.
A simplified COMPANY relational database schema
REDUNDANT INFORMATION IN TUPLES AND UPDATE ANOMALIES Information is stored redundantly Wastes storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
EXAMPLE OF AN UPDATE ANOMALY Consider the relation: EMP_PROJ(Emp#, No_hours) Proj#, Ename, Pname, Update Anomaly: Changing the name of project number P 1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P 1.
EXAMPLE OF AN INSERT ANOMALY Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Insert Anomaly: Cannot insert a project unless an employee is assigned to it. Conversely Cannot insert an employee unless a he/she is assigned to a project.
EXAMPLE OF AN DELETE ANOMALY Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.
Two relation schemas suffering from update anomalies
Base Relations EMP_DEPT and EMP_PROJ formed after a Natural Join : with redundant information
THE PURPOSE OF NORMALIZATION Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. (generally by reducing the no. Of attributes. ) The process of normalization is a formal method that identifies relations based on their primary or candidate keys and the functional dependencies among their attributes.
DEFINITION OF NORMALIZATION Normalization is a process of decomposing a set of relations with anomalies to produce smaller and well -structured relations that contain minimum or no redundancy. Therefore, a process of normalization can be defined as a procedure of successive reduction of a given collection of relational schemas based on their FDs and primary keys to achieve some desirable form of minimized redundancy, minimized insertion and minimized update anomalies.
FUNCTIONAL DEPENDENCIES Functional dependency describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, and B is functionally dependent on A ( denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consist of one or more attributes. ) A Determinant B is functionally dependent on A B Refers to the attribute or group of attributes on the left -hand side of the arrow of a functional dependency
FUNCTIONAL DEPENDENCIES Identifying the primary key Functional dependency is a property of the meaning or semantics of the attributes in a relation. In other words, we can say that, given two rows R 1 and R 2 in table T, if R 1(A)=R 2(A) then R 1(B)=R 2(B). An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation using functional dependency.
FUNCTIONAL DEPENDENCIES Functional Dependencies We say an attribute, B, has a functional dependency on another attribute, A, if for any two records, which have the same value for A, then the values for B in these two records must be the same. We illustrate this as: A B Example: Suppose we keep track of employee email addresses, and we only track one email address for each employee. Suppose each employee is identified by their unique employee number. We say there is a functional dependency of email address on employee number: employee number email address 20
FUNCTIONAL DEPENDENCIES Emp. Num 123 456 555 633 787 Emp. Email jdoe@abc. com psmith@abc. com alee 1@abc. com pdoe@abc. com alee 2@abc. com Emp. Fname John Peter Alan Emp. Lname Doe Smith Lee Doe Lee If Emp. Num is the PK then the FDs: Emp. Num Emp. Email Emp. Num Emp. Fname Emp. Num Emp. Lname must exist. 21
FUNCTIONAL DEPENDENCIES Emp. Num Emp. Email Emp. Num Emp. Fname Emp. Num Emp. Lname Emp. Num 3 different ways you might see FDs depicted Emp. Email Emp. Fname Emp. Lname Emp. Num Emp. Email Emp. Fname Emp. Lname 22
FUNCTIONAL DEPENDENCIES Inference Rules (Armstrong’s Axioms) A set of all functional dependencies that are implied by a given set of functional dependencies X is called closure of X, written X+. A set of inference rule is needed to compute X+ from X. Armstrong’s axioms 1. 2. 3. 4. 5. 6. 7. Relfexivity: If B is a subset of A, then A B Augmentation: If A B, then A, C B, C Transitivity: If A B and B C, then A C Self-determination: A A Decomposition: If A B, C then A B and A C Union: If A B and A C, then A B, C Composition: If A B and C D, then A, C B, D
THE PROCESS OF NORMALIZATION • Normalization is often executed as a series of steps. Each step corresponds to a specific normal form that has known properties. • As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. • For the relational data model, it is important to recognize that it is only first normal form (1 NF) that is critical in creating relations. All the subsequent normal forms are optional.
NORMAL FORM Edgar F. Codd originally established three normal forms: 1 NF, 2 NF and 3 NF. There are now others that are generally accepted, but 3 NF is widely considered to be sufficient for most applications. Most tables when reaching 3 NF are also in BCNF (Boyce -Codd Normal Form).
Various Types of Normal Forms 1 NF 2 NF 3 NF BCNF 4 NF/MVD 5 NF/JD
NORMALIZATION There is a sequence to normal forms: 1 NF is considered the weakest, 2 NF is stronger than 1 NF, 3 NF is stronger than 2 NF, and BCNF is considered the strongest Also, any relation that is in BCNF, is in 3 NF; any relation in 3 NF is in 2 NF; and any relation in 2 NF is in 1 NF. 27
UN-NORMALIZED TABLE or
FIRST NORMAL FORM (INF) q. A relation is said to be in 1 NF if the values in the domain of each attribute of the relation are atomic i. e. all attribute domain are simple and in simple domain all elements are atomic. q 1 NF disallows having a set of values , a tuple of values or a combination of both as an attribute value for a single tuple. It disallows multi-valued attributes that are themselves composite.
FIRST NORMAL FORM Remove horizontal redundancies No two columns hold the same information No single column holds more than a single item Each row must be unique Use a primary key Benefits Easier to query/sort the data More scalable Each row can be identified for updating
FIRST NORMAL FORM (1 NF) Unnormalized form (UNF) A table that contains one or more repeating groups. Repeating group = (property. No, p. Address, rent. Start, rent. Finish, rent, owner. No, o. Name) Client. No CR 76 CR 56 c. Name John kay Aline Stewart property. No p. Address PG 4 6 lawrence St, Glasgow rent. Start 1 -Jul-00 rent. Finish 31 -Aug-01 rent 350 owner. No o. Name CO 40 Tina Murphy 5 Novar Dr, Glasgow 1 -Sep-02 450 CO 93 Tony Shaw PG 4 6 lawrence St, Glasgow 1 -Sep-99 10 -Jun-00 350 CO 40 Tina Murphy PG 36 2 Manor Rd, Glasgow CO 93 Tony Shaw PG 16 5 Novar Dr, Glasgow CO 93 Tony Shaw PG 16 Figure 3 Client. Rental unnormalized table 10 -Oct-00 1 -Nov-02 1 -Dec-01 1 -Aug-03 370 450
TABLE IN 1 NF Client. No c. Name property. No p. Address rent. Start rent. Finish rent owner. No o. Name CR 76 John kay PG 4 6 lawrence St, Glasgow 1 -Jul-00 31 -Aug-01 350 CO 40 Tina Murphy CR 76 John kay PG 16 5 Novar Dr, Glasgow 1 -Sep-02 CR 56 Aline Stewart PG 4 6 lawrence St, Glasgow Aline Stewart PG 36 2 Manor Rd, Glasgow Aline Stewart PG 16 5 Novar Dr, Glasgow CR 56 1 -Sep-99 450 CO 93 Tony Shaw 10 -Jun-00 350 CO 40 Tina Murphy 10 -Oct-00 1 -Dec-01 370 CO 93 Tony Shaw 1 -Nov-02 1 -Aug-03 450 CO 93 Tony Shaw
TABLE 1 Title Author 1 Author 2 ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz Henry F. Korth 0072958863 My. SQL, Computers 1168 Mc. Graw-Hill Operating System Concepts Abraham Silberschatz Henry F. Korth 0471694665 Computers 944 Mc. Graw-Hill
TABLE 1 PROBLEMS This table is not very efficient with storage. This design does not protect data integrity. Third, this table does not scale well.
FIRST NORMAL FORM In our Table 1, we have two violations of First Normal Form: First, we have more than one author field, Second, our subject field contains more than one piece of information. With more than one value in a single field, it would be very difficult to search for all books on a given subject.
FIRST NORMAL TABLE Title Author ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz 0072958863 My. SQL 1168 Mc. Graw-Hill Database System Concepts Henry F. Korth 0072958863 Computers 1168 Mc. Graw-Hill Operating System Concepts Henry F. Korth 0471694665 Computers 944 Mc. Graw-Hill Operating System Concepts Abraham Silberschatz 0471694665 Computers 944 Mc. Graw-Hill • Table 2
EXAMPLE Relation: Lived_in Person Residence City Abhishek Thomas Relation: Lived_in Pune Date. Moved 12/12/2 002 Mumbai 11/10/2 003 Delhi 02/02/2 004 City Date. Moved Delhi 11/11/2 000 Kolkata 04/04/2 009 Un-normalized Table Person City Date_moved Abhishek Pune 12/12/2002 Abhishek Mumbai 11/10/2003 Abhishek Delhi 02/02/2004 Thomas Delhi 11/11/2000 Thomas Kolkata 04/04/2009 Normalized Table
Normalization into 1 NF
TABLES VIOLATING FIRST NORMAL FORM PART (Primary Key) WAREHOUSE P 0010 Warehouse A, Warehouse B, Warehouse C P 0020 Warehouse B, Warehouse D Really Bad Set-up! Better, but still flawed! PART (Primary Key) WAREHOUSE A WAREHOUSE B WAREHOUSE C P 0010 Yes No Yes P 0020 No Yes
TABLE CONFORMING TO FIRST NORMAL FORM PART WAREHOUSE (Primary Key) QUANTITY P 0010 Warehouse A 400 P 0010 Warehouse B 543 P 0010 Warehouse C 329 P 0020 Warehouse B 200 P 0020 Warehouse D 278
PROBLEMS IN 1 NF ü 1 NF contains redundant information. ü Update anomaly during deletion.
FULL FUNCTIONAL DEPENDENCY (FFD) Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. A functional dependency A B is partially dependent if there is some attributes that can be removed from A and the dependency still holds.
PARTIAL DEPENDENCY A partial dependency exists when an attribute B is functionally dependent on an attribute A, and A is a component of a multipart candidate key. Inv. Num Line. Num Qty Inv. Date Candidate keys: {Inv. Num, Line. Num} Inv. Date is partially dependent on {Inv. Num, Line. Num} as Inv. Num is a determinant of Inv. Date and Inv. Num is part of a candidate key 43
The set of attributes X will be fully functionally dependent on the set of attributes Y if the following conditions are satisfied: - X is functionally dependent on Y -X is not functionally dependent on any subset of Y. Example: Relation: Assign Emp-no Project Proj_Budget Yrs_spent_by_emp_on_project 100 P 1 INR 100 CR 3 101 P 2 INR 150 CR 2 102 P 3 INR 102 CR 3 100 P 2 INR 150 CR 5 101 P 1 INR 100 CR 6 FD: {emp_no, project, proj_budget} {yrs_spent_by_emp_on_project}
TABLE IN 2 NF Relation: Assign Emp-no Relation: Budget Project Yrs_spent_by _emp_on_pro ject 100 P 1 3 101 P 2 2 102 P 3 3 100 P 2 5 101 P 1 6 Project Proj_budget P 1 INR 100 CR P 2 INR 150 CR P 3 INR 102 CR So correct FD: {emp_no, project} {yrs_spent_by_emp_on_project} {project} {proj_budget}
FIRST NORMAL TABLE Title Author ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz 0072958863 My. SQL 1168 Mc. Graw-Hill Database System Concepts Henry F. Korth 0072958863 Computers 1168 Mc. Graw-Hill Operating System Concepts Henry F. Korth 0471694665 Computers 944 Mc. Graw-Hill Operating System Concepts Abraham Silberschatz 0471694665 Computers 944 Mc. Graw-Hill • Table 2
We now have two rows for a single book. Additionally, we would be violating the Second Normal Form… A better solution to our problem would be to separate the data into separate tablesan Author table and a Subject table to store our information, removing that information from the Book table:
Subject Table Subject_ID Subject 1 My. SQL 2 Computers Author Table Author_ID Last Name First Name 1 Silberschatz Abraham 2 Korth Henry Book Table ISBN Title Pages Publisher 0072958863 Database System Concepts 1168 Mc. Graw-Hill 0471694665 Operating System Concepts 944 Mc. Graw-Hill
SECOND NORMAL FORM (2 NF) Second normal form (2 NF) is a relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the primary key. The normalization of 1 NF relations to 2 NF involves the removal of partial dependencies. If a partial dependency exists, we remove the function dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.
SECOND NORMAL FORM As the First Normal Form deals with redundancy of data across a horizontal row, Second Normal Form (or 2 NF) deals with redundancy of data in vertical columns. As stated earlier, the normal forms are progressive, so to achieve Second Normal Form, the tables must already be in First Normal Form.
EXAMPLE : REVISITED Client. No CR 76 CR 56 c. Name John kay Aline Stewart property. No p. Address PG 4 6 lawrence St, Glasgow rent. Start 1 -Jul-00 rent. Finish 31 -Aug-01 rent 350 owner. No o. Name CO 40 Tina Murphy 5 Novar Dr, Glasgow 1 -Sep-02 450 CO 93 Tony Shaw PG 4 6 lawrence St, Glasgow 1 -Sep-99 10 -Jun-00 350 CO 40 Tina Murphy PG 36 2 Manor Rd, Glasgow CO 93 Tony Shaw PG 16 5 Novar Dr, Glasgow CO 93 Tony Shaw PG 16 10 -Oct-00 1 -Nov-02 1 -Dec-01 1 -Aug-03 370 450
TABLE IN 1 NF Client. No c. Name property. No p. Address rent. Start rent. Finish rent owner. No o. Name CR 76 John kay PG 4 6 lawrence St, Glasgow 1 -Jul-00 31 -Aug-01 350 CO 40 Tina Murphy CR 76 John kay PG 16 5 Novar Dr, Glasgow 1 -Sep-02 CR 56 Aline Stewart PG 4 6 lawrence St, Glasgow Aline Stewart PG 36 2 Manor Rd, Glasgow Aline Stewart PG 16 5 Novar Dr, Glasgow CR 56 1 -Sep-99 450 CO 93 Tony Shaw 10 -Jun-00 350 CO 40 Tina Murphy 10 -Oct-00 1 -Dec-01 370 CO 93 Tony Shaw 1 -Nov-02 1 -Aug-03 450 CO 93 Tony Shaw
2 NF CLIENTRENTAL RELATION The Client. Rental relation has the following functional dependencies: fd 1 fd 2 fd 3 fd 4 fd 5 fd 6 client. No, property. No rent. Start, rent. Finish (Primary Key) client. No c. Name (Partial dependency) property. No p. Address, rent, owner. No, o. Name (Partial dependency) owner. No o. Name (Transitive Dependency) client. No, rent. Start property. No, p. Address, rent. Finish, rent, owner. No, o. Name (Candidate key) property. No, rent. Start client. No, c. Name, rent. Finish (Candidate key)
2 NF CLIENT-RENTAL RELATION Client (client. No, c. Name) Rental (client. No, property. No, rent. Start, rent. Finish) Property. Owner (property. No, p. Address, rent, owner. No, o. Name) After removing the partial dependencies, the creation of the three new relations called Client, Rental, and Property. Owner Client Rental Client. No c. Name Client. No property. No rent. Start rent. Finish CR 76 CR 56 John Kay Aline Stewart CR 76 PG 4 1 -Jul-00 31 -Aug-01 CR 76 CR 56 PG 16 PG 4 PG 36 PG 16 1 -Sep-02 1 -Sep-99 10 -Oct-00 1 -Nov-02 1 -Sep-02 10 -Jun-00 1 -Dec-01 1 -Aug-03 Property. Owner property. No p. Address rent owner. No o. Name PG 4 6 lawrence St, Glasgow 350 CO 40 Tina Murphy PG 16 5 Novar Dr, Glasgow 450 CO 93 Tony Shaw PG 36 2 Manor Rd, Glasgow 370 CO 93 Tony Shaw Figure 6 2 NF Client. Rental relation
TABLE VIOLATING SECOND NORMAL FORM PART WAREHOUSE (Primary Key) QUANTITY WAREHOUSE ADDRESS P 0010 Warehouse A 400 1608 New Field Road P 0010 Warehouse B 543 4141 Greenway Drive P 0010 Warehouse C 329 171 Pine Lane P 0020 Warehouse B 200 4141 Greenway Drive P 0020 Warehouse D 278 800 Massey Street
TABLES CONFORMING TO SECOND NORMAL FORM PART_STOCK TABLE PART (Primary Key) WAREHOUSE (Primary Key) QUANTITY P 0010 Warehouse A 400 P 0010 Warehouse B 543 P 0010 Warehouse C 329 P 0020 Warehouse B 200 P 0020 Warehouse D 278 WAREHOUSE TABLE 1 ∞ WAREHOUSE (Primary Key) WAREHOUSE_ADDRESS Warehouse A 1608 New Field Road Warehouse B 4141 Greenway Drive Warehouse C 171 Pine Lane Warehouse D 800 Massey Street
PROBLEMS IN 2 NF ü Update anomaly during deletion. ü Still some redundancy. ü FD between non-prime attributes.
THIRD NORMAL FORM (3 NF) Transitive dependency A condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). Third normal form (3 NF) A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key. The normalization of 2 NF relations to 3 NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant.
THIRD NORMAL FORM (3 NF): IN OTHER WAY A relation R is said to be in 3 NF if the relation R is in 2 NF and the non-prime attributes are: üMutually Independent üFunctionally dependent on the primary key. i. e. in 3 NF no non-prime attribute is functionally dependent on another non-prime attribute.
TRANSITIVE DEPENDENCY Emp. Num Dept. Num Emp. Email Dept. Num Dept. Nname Dept. Num Dept. Name Emp. Num Emp. Email Dept. Num Dept. Nname Dept. Name is transitively dependent on Emp. Num via Dept. Num Emp. Num Dept. Name 60
3 NF CLIENTRENTAL RELATION The functional dependencies for the Client, Rental and Property. Owner relations are as follows: Client fd 2 client. No c. Name (Primary Key) client. No, property. No rent. Start, rent. Finish client. No, rent. Start property. No, rent. Finish property. No, rent. Start client. No, rent. Finish (Primary Key) (Candidate key) Rental fd 1 fd 5 fd 6 Property. Owner fd 3 fd 4 property. No p. Address, rent, owner. No, o. Name (Primary Key) owner. No o. Name (Transitive Dependency)
3 NF CLIENTRENTAL RELATION The resulting 3 NF relations have the forms: Client Rental Property. Owner (client. No, c. Name) (client. No, property. No, rent. Start, rent. Finish) (property. No, p. Address, rent, owner. No) (owner. No, o. Name)
3 NF CLIENTRENTAL RELATION Rental Client. No c. Name CR 76 CR 56 John Kay Aline Stewart Client. No property. No rent. Start rent. Finish CR 76 PG 4 1 -Jul-00 31 -Aug-01 CR 76 CR 56 PG 16 PG 4 PG 36 PG 16 1 -Sep-02 1 -Sep-99 10 -Oct-00 1 -Nov-02 1 -Sep-02 10 -Jun-00 1 -Dec-01 1 -Aug-03 Property. Owner property. No p. Address rent owner. No o. Name PG 4 6 lawrence St, Glasgow 350 CO 40 Tina Murphy PG 16 5 Novar Dr, Glasgow 450 CO 93 Tony Shaw PG 36 2 Manor Rd, Glasgow 370 CO 93 Figure 7 2 NF Client. Rental relation
TABLE VIOLATING THIRD NORMAL FORM EMPLOYEE_DEPARTMENT TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPTNAME 000290 John Parker E 11 Operations 000320 Ramlal Mehta E 21 Software Support 000310 Maude Setright E 11 Operations
TABLES CONFORMING TO THIRD NORMAL FORM EMPLOYEE TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPT 000290 John Parker E 11 000320 Ramlal Mehta E 21 000310 Maude Setright E 11 DEPARTMENT TABLE ∞ 1 DEPTNO (Primary Key) DEPTNAME E 11 Operations E 21 Software Support
STILL VARIOUS PROBLEMS IN 3 NF
BOYCE-CODD NORMAL FORM (BCNF) Boyce-Codd normal form (BCNF) A relation is in BCNF, if and only if, every determinant is a candidate key. The difference between 3 NF and BCNF is that for a functional dependency A B, 3 NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.
In 3 NF, but not in BCNF: Instructor teaches one course only. student_no course_no instr_no Student takes a course and has one instructor. {student_no, course_no} instr_no course_no since we have instr_no course-no, but instr_no is not a Candidate key. 68
student_no course_no instr_no BC NF student_no instr_no course_no instr_no {student_no, instr_no} student_no {student_no, instr_no} instr_no course_no 69
BOYCE-CODD NORMAL FORM (BCNF) A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row. ) If a table contains only one candidate key, the 3 NF and the BCNF are equivalent. BCNF is a special case of 3 NF. Next table is in 3 NF but not in BCNF. Next it is shown how the table can be decomposed to conform to the BCNF form.
A Table That Is In 3 NF But Not In BCNF
The Decomposition of a Table Structure to Meet BCNF Requirements
Sample Data for a BCNF Conversion
Decomposition into BCNF
BCNF DEFINITION BCNF Definition A table is in BCNF if every determinant in that table is a candidate key. If a table contains only one candidate key, 3 NF and BCNF are equivalent.
EXAMPLE OF BCNF fd 1 fd 2 fd 3 fd 4 client. No, interview. Date interview. Time staff. No, room. No (Primary Key) staff. No, interview. Date, interview. Time client. No (Candidate key) room. No, interview. Date, interview. Time client. No, staff. No (Candidate key) staff. No, interview. Date room. No (not a candidate key) As a consequence the Client. Interview relation may suffer from update anomalies. For example, two tuples have to be updated if the room. No need be changed for staff. No SG 5 on the 13 -May-02. Client. Interview Client. No interview. Date interview. Time staff. No room. No CR 76 13 -May-02 10. 30 SG 5 G 101 CR 76 CR 74 CR 56 13 -May-02 1 -Jul-02 12. 00 10. 30 SG 5 SG 37 SG 5 G 101 G 102 Figure 8 Client. Interview relation
EXAMPLE OF BCNF To transform the Client. Interview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and Satff. Room as shown below, Interview (client. No, interview. Date, interview. Time, staff. No) Staff. Room(staff. No, interview. Date, room. No) Interview Client. No interview. Date interview. Time staff. No CR 76 13 -May-02 10. 30 SG 5 CR 76 CR 74 CR 56 13 -May-02 1 -Jul-02 12. 00 10. 30 SG 5 SG 37 SG 5 staff. No interview. Date room. No SG 5 13 -May-02 G 101 SG 37 SG 5 13 -May-02 1 -Jul-02 G 102 Staff. Room Figure 9 BCNF Interview and Staff. Room relations
FOURTH NORMAL FORM (4 NF) Multi-valued dependency (MVD) represents a dependency between attributes (for example, A, B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other. A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A > B or A B in relation R is defined as being trivial if • B is a subset of A or • AUB=R A MVD is defined as being nontrivial if neither of the above two conditions is satisfied.
FOURTH NORMAL FORM (4 NF) Fourth normal form (4 NF) A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies. Relation: EMPLOYEE Name Project Hobby Alexis Microsoft Reading Alexis Oracle Music Alexis Microsoft Music Mathew Oracle Movies Mathew Microsoft Riding Mathew Intel Reading Mathew Intel Riding 2 MVD: {Name, Project} and {Name, Hobby} And project and Hobby are independent of each other
TABLE IN 4 NF Relation: Project Relation: Hobby Name Project Alexis Microsoft Alexis Oracle Mathew Microsoft Mathew Intel Name Hobby Alexis Reading Alexis Music Mathew Movies Mathew Riding Mathew Reading
FIFTH NORMAL FORM (5 NF) Fifth normal form (5 NF) A relation that has no join dependency. Lossless-join dependency A property of decomposition, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation.
JOIN DEPENDENCY (JD): FURTHER GENERALIZATION OF MVD q. A JD can be said to exist of the join of R 1 and R 2 over C is equal to relation R. where R 1 and R 2 are the decomposition R 1(A, B, C) and R 2(C, D) of a given relations R(A, B, C, D). q. Alternatively, R 1 and R 2 is a lossless decomposition of R. q. Thus whenever we decompose a relation R into R 1 and R 2 based on MVD in relation R, the decomposition has lossless join property. q. Lossless join dependency can be defined as a property of decomposition , which ensures that no spurious tuples are generated when relations are returned through a natural join.
MORE ON LOSSLESS JOIN The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: X Y X, or X Y Y
DOMAIN KEY NORMAL FORM (DK/NF) OR 6 NF: DEFINITION AND THEORY Domain Key Normal Form (DK/NF) (Fagin, 1981) is a normal form that uses only the key dependencies and the domain dependencies to implement all constraints. Fagin showed that in a DK/NF relationship all constraints are logical consequences of the keys and domains. Because keys and domains are fundamental to all databases and are readily supported by all Database Management Systems, this work is a significant work in the field of Relational databases.
DOMAIN KEY NORMAL FORM(DK/NF) A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains. i. e. a relation is said to be in DF/NF if all possible types of dependencies that should hold on the relation can be enforced simply by enforcing the domain constraints and key constraints in the relation.
Definition: A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains. üA constraint is any rule governing static values of attributes that is precise enough to ascertain whether or not it is true (any rule on the static values of attributes whose truth value can be determined, such as edit rules, interrelation constraints, functional dependency, MVD, but NOT time-dependent constraints). üThe key of a tuple in a relation is the unique identifier of that tuple. üThe domain of an attribute in a relation is the description of an attribute's allowed values.
NORMAL FORMS The first normal form requires that tables be made up of a primary key and a number of atomic fields, and the second and third deal with the relationship of non-key fields to the primary key. These have been summarized as requiring that all non-key fields be dependent on "the key, the whole key and nothing but the key". In practice, most applications in 3 NF are fully normalized. However, research has identified potential update anomalies in 3 NF databases. BCNF (Boyce-Codd Normal Form)is a further refinement of 3 NF that attempts to eliminate such anomalies. The fourth and fifth normal forms (4 NF and 5 NF) deal specifically with the representation of many-many and onemany relationships. Sixth normal form (6 NF) only applies to temporal databases.
NORMALIZATION No transitive dependency between nonkey attributes All determinantsare candidate keys Single multivalued dependency Boyce. Codd and Higher Functional dependencyof nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key
NORMALIZATION Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies. However, a completely normalized database may not be the most efficient or effective implementation. “Denormalization” is sometimes used to improve efficiency.
FIRST NORMAL FORM First normal form (1 NF) lays the groundwork for an organized database design: Ensure that each table has a primary key: minimal set of attributes which can uniquely identify a record. Eliminate repeating groups (categories of data which would seem to be required a different number of times on different records) by defining keyed and non-keyed attributes appropriately. Atomicity: Each attribute must contain a single value, not a set of values.
SECOND NORMAL FORM Second normal form (2 NF) If a table has a composite key, all attributes must be related to the whole key: The database must meet all the requirements of the first normal form. Data which is redundantly duplicated across multiple rows of a table is moved out to a separate table.
THIRD NORMAL FORM Third normal form (3 NF) requires that data stored in a table be dependent only on the primary key, and not on any other field in the table. The database must meet all the requirements of the second normal form. Any field which is dependent not only on the primary key but also on another field is moved out to a separate table.
BOYCE-CODD NORMAL FORM Boyce-Codd normal form (or BCNF) requires that there be no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey).
FIFTH NORMAL FORM Fifth normal form (5 NF and also PJ/NF) requires that there are no nontrivial join dependencies that do not follow from the key constraints. A table is said to be in the 5 NF if and only if it is in 4 NF and every join dependency in it is implied by the candidate keys.
DOMAIN/KEY NORMAL FORM Domain/key normal form (or DKNF) requires that the database contains no constraints other than domain constraints and key constraints.
- Slides: 95