Chapter 8 Relational Database Design Normalization in Databases
Chapter 8: Relational Database Design Normalization in Databases Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www. db-book. com for conditions on re-use
Chapter 8: Relational Database Design n Features of Good Relational Design n Atomic Domains and First Normal Form (1 NF) n Second Normal Form (2 NF) n Third Normal Form (3 NF)
Combine Schemas? n Suppose we combine instructor and department into inst_dept l (No connection to relationship set inst_dept) n Result is possible repetition of information
A Combined Schema Without Repetition n Consider combining relations l sec_class(sec_id, building, room_number) and l section(course_id, sec_id, semester, year) into one relation l section(course_id, sec_id, semester, year, building, room_number) n No repetition in this case
What About Smaller Schemas? n Suppose we had started with inst_dept. How would we know to split up (decompose) it into instructor and department? n Write a rule “if there were a schema (dept_name, building, budget), then dept_name would be a candidate key” n Denote as a functional dependency: dept_name building, budget n In inst_dept, because dept_name is not a candidate key, the building and budget of a department may have to be repeated. l This indicates the need to decompose inst_dept n Not all decompositions are good. Suppose we decompose employee(ID, name, street, city, salary) into employee 1 (ID, name) employee 2 (name, street, city, salary) n The next slide shows how we lose information -- we cannot reconstruct the original employee relation -- and so, this is a lossy decomposition.
A Lossy Decomposition
Example of Lossless-Join Decomposition n Lossless join decomposition n Decomposition of R = (A, B, C) R 1 = (A, B) A B C A B B C 1 2 1 2 A B A, B(r) r A (r) R 2 = (B, C) B (r) A B C 1 2 A B B, C(r)
Normal Forms n 1 NF n 2 NF n 3 NF n Other… l Not covered.
Normal Forms: Review n Unnormalized – There are multivalued attributes or repeating groups n 1 NF – No multivalued attributes or repeating groups. n 2 NF – 1 NF plus no partial dependencies n 3 NF – 2 NF plus no transitive dependencies
First Normal Form n Domain is atomic if its elements are considered to be indivisible units l Examples of non-atomic domains: 4 Set of names, composite attributes 4 Identification numbers like CS 101 that can be broken up into parts n A relational schema R is in first normal form if the domains of all attributes of R are atomic n Non-atomic values complicate storage and encourage redundant (repeated) storage of data l Example: Set of accounts stored with each customer, and set of owners stored with each account
First Normal Form (Cont’d) n Atomicity is actually a property of how the elements of the domain are used. l Example: Strings would normally be considered indivisible l Suppose that students are given roll numbers which are strings of the form CS 0012 or EE 1127 l If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. l Doing so is a bad idea: leads to encoding of information in application program rather than in the database.
Example 1: Table Violating 1 NF Instructor First Name Last Name Phone Number 123 Ali Baba 215 -000 -1212 222 Mikey Mouse 215 -000 -1212 215 -111 -1212 215 -222 -1212 555 Donald Duck 312 -000 -1212 312 -111 -1212 312 -222 -1212
Example 1: Table Not Violating 1 NF Instructor First Name Last Name Phone Number 123 Ali Baba 215 -000 -1212 222 Mikey Mouse 215 -111 -1212 222 Mikey Mouse 215 -222 -1212 555 Donald Duck 312 -000 -1212 555 Donald Duck 312 -111 -1212 It violates other normal forms, though.
Example 2: Table Violating 1 NF Product ID Color Price 1 Black, Red $15 2 Yellow, Purple $20 5 White, Green $40
Example 2: Table Not Violating 1 NF Product ID Color Price 1 Black $15 1 Red $15 2 Yellow $20 2 Purple $20 5 White $40 5 Green $40 It violates other normal forms, though.
Types of Normalization n First Normal Form l each field contains the smallest meaningful value l the table does not contain repeating groups of fields or repeating data within the same field 4 Create a separate field/table for each set of related data. 4 Identify each set of related data with a primary key
Tables Violating First Normal Form PART (Primary Key) WAREHOUSE P 0010 Warehouse A, Warehouse B, Warehouse C P 0020 Warehouse B, Warehouse D Really Bad Set-up! Better, but still flawed! PART (Primary Key) WAREHOUSE A WAREHOUSE B WAREHOUSE C P 0010 Yes No Yes P 0020 No Yes
Table Conforming to 1 NF PART WAREHOUSE (Primary Key) QUANTITY P 0010 Warehouse A 400 P 0010 Warehouse B 543 P 0010 Warehouse C 329 P 0020 Warehouse B 200 P 0020 Warehouse D 278
Second Normal Form – 2 NF n usually used in tables with a multiple-field primary key (composite key) n each non-key field relates to the entire primary key n any field that does not relate to the primary key is placed in a separate table n MAIN POINT – l eliminate redundant data in a table l Create separate tables for sets of values that apply to multiple records
Table Violating 2 NF Where is the problem? PART WAREHOUSE (Primary Key) QUANTITY WAREHOUSE ADDRESS P 0010 Warehouse A 400 1608 New Field Road P 0010 Warehouse B 543 4141 Greenway Drive P 0010 Warehouse C 329 171 Pine Lane P 0020 Warehouse B 200 4141 Greenway Drive P 0020 Warehouse D 278 800 Massey Street
Table Violating 2 NF PART WAREHOUSE (Primary Key) QUANTITY WAREHOUSE ADDRESS P 0010 Warehouse A 400 1608 New Field Road P 0010 Warehouse B 543 4141 Greenway Drive P 0010 Warehouse C 329 171 Pine Lane P 0020 Warehouse B 200 4141 Greenway Drive P 0020 Warehouse D 278 800 Massey Street
Tables Conforming to 2 NF PART_STOCK TABLE PART (Primary Key) WAREHOUSE (Primary Key) QUANTITY P 0010 Warehouse A 400 P 0010 Warehouse B 543 P 0010 Warehouse C 329 P 0020 Warehouse B 200 P 0020 Warehouse D 278 WAREHOUSE TABLE 1 ∞ WAREHOUSE (Primary Key) WAREHOUSE_ADDRESS Warehouse A 1608 New Field Road Warehouse B 4141 Greenway Drive Warehouse C 171 Pine Lane Warehouse D 800 Massey Street
Third Normal Form – 3 NF n Usually used in tables with a single- field primary key n Records do not depend on anything other than a table's primary key n Each non-key field is a fact about the key n Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.
Table Violating 3 NF EMPLOYEE_DEPARTMENT TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPTNAME 000290 John Parker E 11 Operations 000320 Ramlal Mehta E 21 Software Support 000310 Maude Setright E 11 Operations The underlying problem is the transitive dependency to which the Dept. Name attribute is subject. Dept. Name actually depends on WORKDEPT, which in turn depends on the key Emp. NO.
Tables Conforming to Third Normal Form EMPLOYEE TABLE EMPNO (Primary Key) FIRSTNAME LASTNAME WORKDEPT 000290 John Parker E 11 000320 Ramlal Mehta E 21 000310 Maude Setright E 11 DEPARTMENT TABLE ∞ 1 DEPTNO (Primary Key) DEPTNAME E 11 Operations E 21 Software Support
A Note on 2 NF n A table may have multiple candidate key. n A functional dependency on part of any candidate key is a violation of 2 NF. n It is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.
Example Candidate Key PK Manufacturer Model Full Name Manufacturer Country Forte X-Prime Italy Forte Ultraclean Italy Dent-o-Fresh EZbrush USA Kobayashi ST-60 Japan Hoch Toothmaster Germany Hoch X-Prime Germany Example taken from Wikipedia: http: //en. wikipedia. org/wiki/Second_normal_form
Example Electric Toothbrush Manufacturers Manufacturer Country Forte Italy Dent-o-Fresh USA Kobayashi Japan Hoch Germany Electric Toothbrush Models Manufacturer Model Full Name Forte X-Prime Forte Ultraclean Dent-o-Fresh EZbrush Kobayashi ST-60 Hoch Toothmaster Hoch X-Prime
More Examples
Example 1 n Un-normalized Table: Student# Advisor Adv-Room Class 1 1022 10 Susan Jones 4123 12 Anne Smith 216 Class 2 Class 3 101 -07 143 -01 159 -02 101 -07 159 -02 214 -01
n Table in First Normal Form l No Repeating Fields l Data in Smallest Parts Student# Advisor. FName Advisor. LName Adv. Room Class# 1022 10 Susan Jones 412 101 -07 1022 10 Susan Jones 412 143 -01 1022 10 Susan Jones 412 159 -02 4123 12 Anne Smith 216 101 -07 4123 12 Anne Smith 216 159 -02 4123 12 Anne Smith 216 214 -01
n Is table in 2 NF? l What is the key? Student# Advisor. FName Advisor. LName Adv. Room Class# 1022 10 Susan Jones 412 101 -07 1022 10 Susan Jones 412 143 -01 1022 10 Susan Jones 412 159 -02 4123 12 Anne Smith 216 101 -07 4123 12 Anne Smith 216 159 -02 4123 12 Anne Smith 216 214 -01 2011 10 Susan Jones 412 101 -07
n Is table in 2 NF? l What is the key? Student# Advisor. FName Advisor. LName Adv. Room Class# 1022 10 Jones 412 101 -07 1022 10 Susan Jones What do we notice? 412 143 -01 1022 10 • Advisor fields depend on Student# Susan Jones 412 159 -02 4123 12 Anne Smith 216 101 -07 4123 12 Anne Smith 216 159 -02 4123 12 Anne Smith 216 214 -01 2011 10 Susan Jones 412 101 -07 Susan
n Tables in Second Normal Form l Redundant Data Eliminated Table: Registration Table: Students Student# Advisor# Adv. First. Name Adv. Last. Name Adv. Room 1022 10 Susan Jones 4123 12 Anne Smith 216 2011 10 Susan Jones 412 Student# Class# 1022 101 -07 1022 143 -01 1022 159 -02 4123 201 -01 4123 211 -02 4123 214 -01
n Tables Registration in 2 NF n Who about the Students? Table: Registration Table: Students Student# Advisor# Adv. First. Name Adv. Last. Name Adv. Room 1022 4123 2011 10 12 10 Susan Anne Susan Jones Smith Jones 216 412 What is the candidate key for Students? Student# Class# 1022 101 -07 1022 143 -01 1022 159 -02 4123 201 -01 4123 211 -02 4123 214 -01
n Tables in 2 NF. Table: Advisors Advisor# Adv. First. Name Adv. Last. Name Room 10 Susan Jones 412 12 Anne Smith 216 Table: Students Table: Registration Student# Class# 1022 101 -07 1022 143 -01 1022 159 -02 4123 201 -01 Student# Advisor# 1022 10 4123 211 -02 4123 12 4123 214 -01 2011 10
Relationships for Example 1 Registration Students Advisors Student# Advisor# Class# Advisor# Adv. First. Name Adv. Last. Name Adv-Room
Example 2 n Un-normalized Table: Emp. ID Name Dept Code Dept Name Proj 1 Time Proj 1 Proj 2 Time Proj 2 Proj 3 Time Proj 3 EN 1 -26 Sean Breen TW Technical Writing 30 -T 3 25% 30 -TC 40% 31 -T 3 30% EN 1 -33 Amy Guya TW Technical Writing 30 -T 3 50% 30 -TC 35% 31 -T 3 60% EN 1 -36 Liz Roslyn AC Accounting 35 -TC 90%
Table in First Normal Form Emp. ID Project Time on Number Project Last Name First Name Dept Code Dept Name EN 1 -26 30 -T 3 25% Breen Sean TW Technical Writing EN 1 -26 30 -TC 40% Sean TW Technical Writing EN 1 -26 31 -T 3 30% Breen th Sean Technical Writing EN 1 -33 30 -T 3 50% Guya ca n Amy d TW EN 1 -33 30 -TC 35% Guya Amy EN 1 -33 31 -T 3 60% Guya Amy TW Technical Writing EN 1 -36 35 -TC 90% Roslyn Liz AC Accounting WBreen ha t is e Technical Writing ida. TW te TW ke Technical Writing y?
Tables in Second Normal Form Table: Employees and Projects Table: Employees Time on Project Emp. ID Last Name First Name Dept Code Dept Name EN 1 -26 30 -T 3 25% EN 1 -26 Breen Sean TW Technical Writing EN 1 -26 30 -T 3 40% EN 1 -33 Guya Amy TW Technical Writing EN 1 -26 31 -T 3 30% EN 1 -36 Roslyn Liz AC Accounting EN 1 -33 30 -T 3 50% EN 1 -33 30 -TC 35% EN 1 -33 31 -T 3 60% EN 1 -36 35 -TC 90% Emp. ID Project Number Are they in 3 NF? The underlying problem is the transitive dependency to which the Dept Name attribute is subject. Dept Name actually depends on Dept Code, which in turn depends on the key Emp. ID.
Tables in Third Normal Form Table: Employees_and_Projects Emp. ID Project Number Time on Project EN 1 -26 30 -T 3 25% EN 1 -26 30 -T 3 40% EN 1 -26 31 -T 3 30% EN 1 -33 30 -T 3 50% EN 1 -33 30 -TC 35% EN 1 -33 31 -T 3 60% EN 1 -36 35 -TC 90% Table: Employees Emp. ID Last Name First Name Dept Code EN 1 -26 Breen Sean TW EN 1 -33 Guya Amy TW EN 1 -36 Roslyn Liz AC Table: Departments Dept Code Dept Name TW Technical Writing AC Accounting
Relationships for Example 2 Employees_and_Projects Emp. ID Project. Number Timeon. Project Employees Departments Emp. ID Dept. Code First. Name Dept. Name Last. Name Dept. Code
Example 3 • Un-normalized Table: Emp. ID Name Manager Dept Sector Spouse/Children 285 Carlson Smithers Engineering 6 G 365 Lenny Smithers Marketing 458 Homer Simpson Mr. Burns Safety 8 G 7 G Marge, Bart, Lisa, Maggie
Table in First Normal Form Fields contain smallest meaningful values Emp. ID FName LName Manager Dept Sector Spouse Child 1 Child 2 Child 3 285 Carlson Smithers Eng. 6 G 365 Lenny Smithers Marketing 8 G 458 Homer Simpson Mr. Burns Safety 7 G Marge Bart Lisa Maggie
Table in First Normal Form No more repeated fields Emp. ID FName LName Manager 285 Carlson 365 Is Lenny th e ha Smithers t 458 458 Department Sector Dependent Engineering 6 G th 8 G W Smithers is Marketing e ca ta nd Simpson Mr. Burns Safety ble ida 7 G in te k 7 G Homer Simpson 2 N Mr. Burns Safety ey F? ? Homer Marge Bart Homer Simpson Mr. Burns Safety 7 G Lisa Homer Simpson Mr. Burns Safety 7 G Maggie
Second/Third Normal Form Remove Repeated Data From Table Step 1 Emp. ID FName LName Manager Department 285 Carl Smithers Engineering 6 G 365 Lenny Smithers Marketing 458 Homer Simpson Mr. Burns Safety Carlson Emp. ID Dependent 458 Marge 458 Bart 458 Lisa 458 Maggie Sector 8 G 7 G
Tables in Second Normal Form Removed Repeated Data From Table Step 2 Emp. ID FName LName Manager. ID Dept 285 Carl 2 Engineering 6 G 365 Lenny 2 Marketing 8 G 458 Homer Simpson 1 Safety 7 G Carlson Emp. ID Dependent Sector We look for the transitive dependency. 458 Marge Manager. ID Manager 458 Bart 1 Mr. Burns 458 Lisa 458 Maggie 2 Smithers
Tables in Second Normal Form How about 3 NF? Step 3 Emp. ID FName LName Manager. ID Dept 285 Carl 2 Engineering 6 G 365 Lenny 2 Marketing 8 G 458 Homer Simpson 1 Safety 7 G Carlson Emp. ID Dependent 458 Marge 458 Bart 458 Lisa 458 Maggie Sector We look the transitive dependency. If I know Dept, then I know Manager. ID and Sector. If I 1 Mr. Burns know Emp. ID then I know Dept. 2 Smithers
Tables in Third Normal Form Employees Table Manager Table Emp. ID FName LName Dept. Code Manager. ID Manager 285 Carl EN 1 Mr. Burns 365 Lenny MK 2 Smithers 458 Homer Simpson SF Carlson Dependents Table Department Table Emp. ID Dependent 458 Marge EN Engineering 6 G 2 458 Bart MK Marketing 8 G 2 458 Lisa SF Safety 7 G 1 458 Maggie Dept. Code Department Sector Manager. ID
Example 4 Table Violating 1 st Normal Form Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 Time 3 TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs Kilroy Inc. 9 hrs RK-56 Mary Mayhem Linkers Italiana 67 hrs 2 hrs Table in 1 st Normal Form Rep ID Rep First Name Rep Last Name Client ID* Client Time With Client TS-89 Gilroy Gladstone 978 US Corp 14 hrs TS-89 Gilroy Gladstone 665 Taggarts 26 hrs TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs RK-56 Mary Mayhem 221 Italiana 67 hrs RK-56 Mary Mayhem 982 Linkers 2 hrs
Tables in 2 nd and 3 rd Normal Form Rep ID* Client ID* Time With Client TS-89 978 14 hrs TS-89 665 26 hrs TS-89 782 RK-56 Rep ID* First Name Last Name TS-89 Gilroy Gladstone RK-56 Mary Mayhem 9 hrs Client ID* Client Name 221 67 hrs 978 US Corp RK-56 982 2 hrs 665 Taggarts RK-56 665 4 hrs 782 Kilroy Inc. 221 Italiana 982 Linkers This example comes from a tutorial from http: //www. devhood. com/tutorials/tutorial_details. aspx? tutorial_id=95 and http: //www. devhood. com/tutorials/tutorial_details. aspx? tutorial_id=104 Please check them out, as they are very well done.
Example 5 Table in 1 st Normal Form Supplier. ID Status City Part. ID Quantity S 1 20 London P 1 300 S 1 20 London P 2 200 S 2 10 Paris P 1 300 S 2 10 Paris P 2 400 S 3 10 Paris P 2 200 S 4 20 London P 4 300 Although this table is in 1 NF it contains redundant data. For example, information about the supplier's location and the location's status have to be repeated for every part supplied. Redundancy causes what are called update anomalies. Update anomalies are problems that arise when information is inserted, deleted, or updated. For example, the following anomalies could occur in this table: INSERT. The fact that a certain supplier (s 5) is located in a particular city (Athens) cannot be added until they supplied a part. DELETE. If a row is deleted, then not only is the information about quantity and part lost but also information about the supplier. UPDATE. If supplier s 1 moved from London to New York, then two rows would have to be updated with this new information.
Tables in 2 NF Suppliers Parts Supplier. ID Status City Supplier. ID Part. ID Quantity S 1 20 London S 1 P 1 300 S 2 10 Paris S 1 P 2 200 S 3 10 Paris S 4 20 London S 2 P 1 300 S 5 30 Athens S 2 P 2 400 S 3 P 2 200 S 4 P 4 300 S 4 P 5 400 Tables in 2 NF but not in 3 NF still contain modification anomalies. In the example of Suppliers, they are: INSERT. The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until there is a supplier in the city. DELETE. Deleting any row in SUPPLIER destroys the status information about the city as well as the association between supplier and city.
- Slides: 53