Database Normalisation n What is Database Normalisation n
Database Normalisation n What is Database Normalisation? n Why need Database Normalisation? n How to perform Database Normalisation? n Summary of Database Normalisation.
What is Normalisation n Normalisation uses a set of restrictions to exclude the undesirable properties from database design. The restrictions are called ‘normal forms’; Normalisation achieves a good database design by reducing the amount of data duplication.
Why need Normalisation Review on Logical Database design: n Data analysis Ø n Data analysis methods: Ø n To identify the natural structure called conceptual data model. Top-down and bottom-up; Data design Ø To determine the logical database structure, represented by a set of relations/tables, their attributes and their keys.
Why need Normalisation Example: Order_No Area S_Off C_No C_Name P_NO P_Name QTY 01 W York Leeds C 1 Nippers P 1 Pantaloons 100 01 W York Leeds C 1 Nippers P 2 Pantaloons 50 02 Middl Oxford C 2 Tots-Gear P 1 Pantaloons 100 02 Middl Oxford C 2 Tots-Gear P 5 Pinafore 200 03 Middl Oxford C 2 Tots-Gear P 3 Socks 50 04 Middl Oxford C 9 Kid-Naps P 3 Socks 50 What problems are there in this table?
Why need Normalisation Three undesirability caused by Data Duplication: n Unnatural representation of information. n Difficult to update the database. n More storage space is required.
How to do Normalisation is to make a more natural representation by: n Removing database update problems; n Making the database smaller. ü First Normal Form (1 NF) ü Second Normal Form (2 NF) and Third Normal Form (3 NF) ü Boyce-Codd Normal Form (BCNF); n Fourth and Fifth Normal Form (4 NF and 5 NF)
First Normal Form (1 NF) 1 NF: tuples only contain atomic attribute that can not be spit into any smaller components. 1 NF: ensures that each attribute represents a single fact. Attributes which contain sub-attributes and repeating groups of attributes are not permitted.
First Normal Form (1 NF) Example: 0 NF_CUSTOMER_ORDER CUSTOMER Order_No 01 02 Area W York Middl ORDER_LINES S_Off Leeds Oxford PRODUCT QTY C_No C_Name P_NO P_Name C 1 Nippers P 1 Pantaloons 100 P 2 Pantaloons 50 P 1 Pantaloons 100 P 5 Pinafore 200 C 2 Tots-Gear 03 Middl Oxford C 2 Tots-Gear P 3 Socks 50 04 Middl Oxford C 9 Kid-Naps P 3 Socks 50
First Normal Form (1 NF) 1 NF: Remove the attributes that are combinations of other attributes 1 NF_CUSTOMER_ORDER Order_No Area S_Off C_No C_Name P_NO P_Name QTY 01 W York Leeds C 1 Nippers P 1 Pantaloons 100 01 W York Leeds C 1 Nippers P 2 Pantaloons 50 02 Middl Oxford C 2 Tots-Gear P 1 Pantaloons 100 02 Middl Oxford C 2 Tots-Gear P 5 Pinafore 200 03 Middl Oxford C 2 Tots-Gear P 3 Socks 50 04 Middl Oxford C 9 Kid-Naps P 3 Socks 50 Again, What is the problem in this table?
2 NF and 3 NF (Key) Review on the concept of Key of relation: Key is one or a group of attributes, which uniquely identify tuples of a relation. A relation may have many keys, called candidate keys. One of candidate keys, which is used for the tuple identification purpose, is called the primary key. A primary key must never have a null value, and may be used as a foreign key to represent relationships between objects.
2 NF and 3 NF (FD) Functional Dependency (FD) FD defines the relationship between key (identifying attributes) and other attributes (representing single-values facts). Employee_No Name Home_Address 123 J Bloggs 25 High Street, Leeds 314 D Brown 5 Low Street, Leeds 127 R Bloggs 25 High Street, Leeds 500 J Smith 19 Grand Way, Leeds 512 J Smith 20 Parsons View, Leeds An employee is identified by the Employee_No Name and Home Address represent single-valued fact about employee.
2 NF and 3 NF (FD) Relationships between Employee_No, Name and Home_Address: Employee_No Name Home_Address 123 J Bloggs 25 High Street, Leeds 314 D Brown 5 Low Street, Leeds 127 R Bloggs 25 High Street, Leeds 500 J Smith 19 Grand Way, Leeds 512 J Smith 20 Parsons View, Leeds X Y: X determines Y. ? ? ? Name Employee_No, Name Home_Address Employee_No is the key Employee_No Name Employee_No Home_Add Employee_No Name, Home_Add They are not correct!
2 NF and 3 NF (FD) Functional Dependency (FD) X Y means: The value of Y represents a single-valued fact, and this fact is identified by the value of X. X is called the determinant Y is called the determined attributes.
2 NF and 3 NF (Partial FD) Partial FD on a key: the determinant is part of a key. Order_No Area S_Off C_No C_Name P_NO P_Name QTY 01 W York Leeds C 1 Nippers P 1 Pantaloons 100 01 W York Leeds C 1 Nippers P 2 Pantaloons 50 02 Middl Oxford C 2 Tots-Gear P 1 Pantaloons 100 02 Middl Oxford C 2 Tots-Gear P 5 Pinafore 200 03 Middl Oxford C 2 Tots-Gear P 3 Socks 50 04 Middl Oxford C 9 Kid-Naps P 3 Socks 50 (Order_NO, P_No) QTY; What are the FDs? ? Order_No C_No, C_Name, Area, S_Off; C_No C_Name, Area, S_Off; P_N 0 P_Name Area S_Off.
2 NF and 3 NF (Partial FD) Partial FD on a key: the determinant is part of a key. (Order_NO, P_No) QTY; Order_No C_No, C_Name, Area, S_Off; P_N 0 P_Name C_No C_Name, Area, S_Off; Area S_Off. There are two Partial FD on the key: Order_No C_No, C_Name, Area, S_Off; P_N 0 P_Name
2 NF and 3 NF (Partial FD) A Partial FD on the key occurs when the key includes the more than one objects and their identifiers. The existence of Partial FD on the key result in the data duplication, and inconvenience in updating database. The 2 NF and 3 NF prohibit this situation where each tuple include the information about more than one objects. To design a database in 2 NF, it is necessary to split the relation using project operation of relational algebra.
2 NF (Method) The Second Normalisation of relation (2 NF): Step 1: using the partial FD (X Y) on a key forms a new relation (X, Y), within which X is a key. Step 2: using the key of original relation, and the attributes that are not functionally determined by X.
2 NF (Example) Example: attributes (Order_No, Area, S_Off, C_No, C_Name, P_No, P_Name, QTY) Partial FD on a key: Order_No C_No, C_Name, Area, S_Off; P_N 0 P_Name Step 1: (X, Y) 2 NF_Product (P_N 0, P_Name); 2 NF_CUSTOMER_ORDER(Order_No, C_Name, Area, S_Off); Step 2: (original key, others) 2 NF_ORDER_LINE(Order_No, P_No, QTY)
2 NF (Example) Results: 1 NF_CUSTOMER_ORDER Order_No Area 2 NF_Product P_NO P_Name S_Off C_No C_Name P_NO P_Name QTY 2 NF_CUSTOMER_ORDER Order_No C_No Name Area 2 NF_ORDER_LINE Order_No P_No QTY S_Off
2 NF and 3 NF The Third Normalisation of relation (3 NF): 2 NF prohibits that a relation contains more than one objects using key as identifiers of objects. 3 NF, based on 2 NF, prohibits that a relation contain more than one objects using non-key attributes as identifier.
3 NF (Transitive FD) The Third Normalisation of relation (3 NF): Transitive FDs of non-key attribute on keys: a non-key attributes is determined by another non-key attribute, and the latter is determined by a key. If attribute X is a key, and attributes Y and Z are not included in keys, then there is a transitive FD of Z on x if: X Y and Y Z.
3 NF (Example) Example: 2 NF_CUSTOMER_ORDER(Order_No, C_Name, Area, S_Off) ORDER_NO C_No, C_Name, Area, S_Off C_No C_Name, Area, S_Off ORDER_NO C_No C_Name, Area, S_Off. This relation can be split into two relations. One is identified by the key (ORDER_No) and another is identified by the non-key (C_No).
3 NF (Method) The third Normalisation Method: Step 1: For X Y Z, a new relation is formed by (Y, Z) Step 2: Another new relation is formed by (X, Y , others) 2 NF_CUSTOMER_ORDER(Order_No, C_Name, Area, S_Off) ORDER_NO C_No C_Name, Area, S_Off. 2 NF_CUSTOMER 3 NF_ORDER Order_No C_No C_Name Area S_Off
Strong 3 NF: BCNF (Boyce-Codd Normal Form) require that: 1) Each tuple represent only one object; 2) Every determinant of a FD (X in X Y) is a candidate key. BCNF is stronger than 3 NF as 3 NF does no require X is a candidate key.
Strong 3 NF: BCNF P_No Ware_house Bin_No QTY P 1 WH 1 B 1 100 P 1 WH 1 B 3 200 Key: P 2 WH 3 B 2 3000 (Ware_house, Bin_No); P 5 WH 4 B 9 50 P 5 WH 4 B 10 50 P 5 WH 4 B 11 50 3 NF_STOCK (P_No, Bin_No) If each type of product is required to store only at one site, i. e. P_NO Ware_house, then we have: P_No Ware_house; (P_No, Bin_No) QTY (Ware_house, Bin_No) P_No, QTY.
Strong 3 NF: BCNF P_No Ware_house; (P_No, Bin_No) QTY (Ware_house, Bin_No) P_No, QTY. Keys: P_No Ware_house Bin_No QTY (Ware_house, Bin_No); P 1 WH 1 B 1 100 (P_No, Bin_No) P 1 WH 1 B 3 200 P 2 WH 3 B 2 3000 P 5 WH 4 B 9 50 P 5 WH 4 B 10 50 P 5 WH 4 B 11 50 Is P_No a key? ? ?
Strong 3 NF: BCNF As P_NO is not a key, this relation is not in BCNF. A BCNF is generated by repeatedly applying the following process: For a relation (X, Y, Z), in which X Y, but X is not a key, then a new relation (X, Y) is formed, and another new relation is formed by (X, Z) P_No Ware_house Bin_No Quantity P_No Ware_house, but P_No is not a key. P_No Ware_house P_No Bin_No Quantity
4 NF and 5 NF 1 -3 NFs and BCNF concern with single-valued objects. 4 NF and 5 Nf concern with multi-valued objects. 4 NF: Removes repeating groups of INDEPENDENT multi-valued facts. 5 NF: Removes redundancy when there are INTERDEPENDENT multivalued facts
Summary: Normalisation Unnecessary repetition of data cause the inability to represent information, update anomalies, and excessive database size. Normal forms are retractions to exclude these unnecessary data repetition. Normal forms (NFs) are defined in terms of functional dependence: X Y models the relationship between the attributes X and the attributes Y. Normalisation is the process of applying the constraints of NFs.
Summary: 1 -3 NFs and BCNF 1 NF, 2 NF, 3 NF and BCNF are defined to remove unnecessary duplication data that represents single-valued facts. 1 NF: the use of atomic attributes. 2 NF: non-key attribute dependent on the key. 3 NF: non-key attribute dependent on WHOLE key. BCNF: 3 NF plus all determinants in the relation are candidate keys. Reading: 4. 10 -4. 14 Exercises: 4. 8 -4. 17
- Slides: 30