The Normal Forms 3 NF and BCNF Preview
The Normal Forms 3 NF and BCNF
Preview n Anomaly n Normalization n Solution: Normal Forms n Introducing 3 NF and BCNF n 3 NF n Examples n BCNF
Modification Anomalies n Anomaly is occurrence or object that is strange, unusual, or unique. n Unexpected side effect n Insert, modify, and delete more data than desired n Caused by excessive redundancies n Strive for one fact in one place
Anomalies: Example Assume the position determines the salary: position → salary T 1 first_name last_name address department position salary Dewi Srijaya 12 a Jln Lempeng Toys clerk 2000 Izabel Leong 10 Outram Park Sports trainee 1200 John Smith 107 Clementi Rd Toys clerk 2000 Axel Bayer 55 Cuscaden Rd Sports trainee 1200 Winny Lee 10 West Coast Rd Sports manager 2500 Sylvia Tok 22 East Coast Lane Toys manager 2600 Eric Wei 100 Jurong drive Toys assistant manager 2200 ? ? security guard 1500 Redundant storage Update anomaly key Potential deletion anomal Insertion anomaly
Decomposition Example T 2 first_name last_name address department position Dewi Srijaya 12 a Jln lempeng Toys clerk Izabel Leong 10 Outram Park Sports trainee John Smith 107 Clementi Rd Toys clerk Axel Bayer 55 Cuscaden Rd Sports trainee Winny Lee 10 West Coast Rd Sports manager Sylvia Tok 22 East Coast Lane Toys manager Eric Wei 100 Jurong drive Toys assistant manager T 3 position p. No salary clerk 2000 trainee 1200 manager 2500 assistant manager 2200 security guard 1500 Redundant storage p. No Update anomaly p. No Deletion anomaly p. No Insertion anomaly
Normalization n Normalization is the process of efficiently organizing data in a database with two goals in mind n First goal: eliminate redundant data – for example, storing the same data in more than one table n Second Goal: ensure data dependencies make sense – for example, only storing related data in a table
Benefits of Normalization n Less storage space n Quicker updates n Less data inconsistency n Clearer data relationships n Easier to add data n Flexible Structure
The Solution: Normal Forms n Bad database designs results in: – redundancy: inefficient storage. – anomalies: data inconsistency, difficulties in maintenance n 1 NF, 2 NF, 3 NF, BCNF are some of the early forms in the list that address this problem
Third Normal Form (3 NF) 1) Meet all the requirements of the 1 NF Meet all the requirements of the 2 NF 3) Remove columns that are not dependent upon the primary key. 2)
1) First normal form -1 NF • 1 NF : if all attribute values are atomic: no repeating group, no composite attributes. n The following table is not in 1 NF DPT_NO MG_NO EMP_NM D 101 12345 20000 20001 20002 Carl Sagan Mag James Larry Bird D 102 13456 30000 30001 Jim Carter Paul Simon
Table in 1 NF DPT_NO MG_NO EMP_NM D 101 12345 20000 Carl Sagan D 101 12345 20001 Mag James D 101 12345 20002 Larry Bird D 102 13456 30000 Jim Carter D 102 13456 n 30001 Paul Simon all attribute values are atomic because there are no repeating group and no composite attributes.
2) Second Normal Form – Second normal form (2 NF) further addresses the concept of removing duplicative data: § A relation R is in 2 NF if – (a) R is 1 NF , and – (b) all non-prime attributes are fully dependent on the candidate keys. Which is creating relationships between these new tables and their predecessors through the use of foreign keys. A prime attribute appears in a candidate key. § There is no partial dependency in 2 NF. § Example is next…
No dependencies on non-key attributes Inventory Description Supplier Cost Supplier Address There are two non-key fields. So, here are the questions: • If I know just Description, can I find out Cost? No, because we have more than one supplier for the same product. • If I know just Supplier, and I find out Cost? No, because I need to know what the Item is as well. Therefore, Cost is fully, functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence. Inventory Description Supplier Cost
CONTINUED… Inventory Description Supplier Cost Supplier Address • If I know just Description, can I find out Supplier Address? No, because we have more than one supplier for the same product. • If I know just Supplier, and I find out Supplier Address? Yes. The Address does not depend upon the description of the item. Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence. Supplier Name Supplier Address
So putting things together Inventory Description Supplier Cost Supplier Address Inventory Description Supplier Cost Supplier Name Supplier Address The above relation is now in 2 NF since the relation has no nonkey attributes.
3) Remove columns that are not dependent upon the primary key. So for every nontrivial functional dependency X --> A, (1) X is a superkey, or (2) A is a prime (key) attribute.
Example of 3 NF Books Name Author's Non-de Plume # of Pages • If I know # of Pages, can I find out Author's Name? No. Can I find out Author's Non-de Plume? No. • If I know Author's Name, can I find out # of Pages? No. Can I find out Author's Non-de Plume? YES. Therefore, Author's Non-de Plume is functionally dependent upon Author's Name, not the PK for its existence. It has to go. Books Name Author's Name # of Pages Author Name Non-de Plume
Another example: Suppose we have relation S n S(SUPP#, PART#, SNAME, QUANTITY) with the following assumptions: n (1) SUPP# is unique for every supplier. (2) SNAME is unique for every supplier. (3) QUANTITY is the accumulated quantities of a part supplied by a supplier. (4) A supplier can supply more than one part. (5) A part can be supplied by more than one supplier. n We can find the following nontrivial functional dependencies: n (1) SUPP# --> SNAME (2) SNAME --> SUPP# (3) SUPP# PART# --> QUANTITY (4) SNAME PART# --> QUANTITY n n The candidate keys are: (1) SUPP# PART# (2) SNAME PART# n The relation is in 3 NF.
The table in 3 NF SUPP# S 1 SNAME Yues PART# QTY P 1 100 S 1 Yues P 2 200 S 2 Yues P 3 250 S 2 Jones P 1 300
Example with first three forms Suppose we have this Invoice Table First Normal Form: No repeating groups. • The above table violates 1 NF because it has columns for the first, second, and third line item. • Solution: you make a separate line item table, with it's own key, in this case the combination of invoice number and line number
Table now in 1 NF
Second Normal Form: Each column must depend on the *entire* primary key.
Third Normal Form: Each column must depend on *directly* on the primary key.
Boyce-Codd Normal Form (BCNF) Boyce-Codd normal form (BCNF) A relation is in BCNF, if and only if, every determinant is a candidate key. The difference between 3 NF and BCNF is that for a functional dependency A B, 3 NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.
Client. Interview Client. No interview. Date interview. Time staff. No room. No CR 76 13 -May-02 10. 30 SG 5 G 101 CR 76 13 -May-02 12. 00 SG 5 G 101 CR 74 13 -May-02 12. 00 SG 37 G 102 CR 56 1 -Jul-02 10. 30 SG 5 G 102 n FD 1 client. No, interview. Date interview. Time, staff. No, room. No (Primary Key) n FD 2 staff. No, interview. Date, interview. Time client. No (Candidate key) n FD 3 room. No, interview. Date, interview. Time client. No, staff. No (Candidate key) n FD 4 staff. No, interview. Date room. No n As a consequece the Client. Interview relation may suffer from update anmalies. n For example, two tuples have to be updated if the room. No need be changed for staff. No SG 5 on the 13 -May-02. (not a candidate key)
Example of BCNF(2) To transform the Client. Interview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and Staff. Room as shown below, Interview (client. No, interview. Date, interview. Time, staff. No) Staff. Room(staff. No, interview. Date, room. No) Interview Client. No interview. Date interview. Time staff. No CR 76 CR 74 CR 56 13 -May-02 1 -Jul-02 10. 30 12. 00 10. 30 SG 5 SG 37 SG 5 Staff. Room staff. No interview. Date room. No SG 5 SG 37 SG 5 13 -May-02 1 -Jul-02 G 101 G 102 BCNF Interview and Staff. Room relations
BCNF Example R = (B, C, E) F = {{E} {B}, {B, C} {E}} Two candidate keys: BC and EC n {B, C} {E} does not violate BCNF because BC is a key n {E} {B} violates BCNF because E is not a key n In order to achieve BCNF we have to decompose the table but how? n Since the decomposition must be lossless, we only have one option: R 1(B, E), and R 2(C, E). The common attribute E should be key of one fragment, here R 1.
BCNF Example (cont) Bank-schema = (Branch B, Customer C, Employee E) F = {{E} {B}, {B, C} {E}} Decompose into R 1(B, E), and R 2(C, E) Branch Customer Employee HKUST Wong Au HKUST Chin Au Central Wong Jones Central null Cheng We have avoided the problems of redundancy and null values of 3 NF Branch Employee HKUST Au Central Jones Central Cheng Customer Employee Wong Au Chin Au Wong Jones
BCNF Example (cont) We can generate the original table by joining the two fragments (however, but we must use an outer join -an outer join fills null values for tuples that do not have join partners) Branch Employee Customer Employee HKUST Au Wong Au Central Jones Chin Au Central Cheng Wong Jones = Branch Cust. Empl. HKUST Wong Au HKUST Chin Au Central Wong Jones Central null Cheng Is the decomposition dependency preserving? No. We loose {B, C} {E} Can we have a dependency preserving decomposition? No. No matter how we break we loose {B, C} {E} since it involves all attributes
Observations about BCNF n n n Best Normal Form Avoids the problems of redundancy and all anomalies There is always a lossless decomposition that generates BCNF tables However, we may not be able to preserve all dependencies Next step: an algorithm for automatically generating BCNF tables.
Sources: http: //www. troubleshooters. com/littstip/ltnorm. html n http: //www. cs. jcu. edu. au/Subjects/cp 1500/1998/Lecture _Notes/normalisation/3 nf. html n
- Slides: 31