Normalization in Databases What is Normalization Unnormalized data
Normalization in Databases
What is Normalization? • Unnormalized data exists in flat files • Normalization is the process of moving data into related tables
Why Normalize Tables? • Save typing of repetitive data • Increase flexibility to query, sort, summarize, and group data. • Reduce disk space
A Typical Spreadsheet File Emp No Employee Name Time Card No Time Card Date Dept No Dept Name 10 Thomas Arquette 106 11/02/2002 20 Marketing 10 Thomas Arquette 115 11/09/2002 20 Marketing 99 Janice Smitty 10 Accounting 500 Alan Cook 107 11/02/2002 50 Shipping 700 Ernest Gold 108 11/02/2002 50 Shipping 700 Ernest Gold 116 11/09/2002 50 Shipping
Employee, Department, and Time Card Data in Three Tables Table: Employees Emp. No Emp. First. Name Emp. Last. Name Dept. No 10 Thomas Arquette 20 500 Alan Cook 50 700 Ernest Gold 50 99 Janice Smitty 10 Table: Departments Table: Time Card Data Primary Key Time. Card. No Emp. No Time. Card. Date 106 10 11/02/2002 107 500 11/02/2002 108 700 11/02/2002 115 10 11/09/2002 116 700 11/09/2002 Dept. No Dept. Name 10 Accounting 20 Marketing 50 Shipping
Types of Normalization • First Normal Form • A database is in first normal form if it satisfies the following conditions: q. Contains only atomic values q. There are no repeating groups • An atomic value is a value that cannot be divided. • A repeating group means that a table contains two or more columns that are closely related.
1 st Normal Form Example • How do we bring an unnormalized table into first normal form? Consider the following example: • This table is not in first normal form because the [Color] column can contain multiple values. For example, the first row includes values "red" and "green. "
1 st Normal Form Example……. . cont • To bring the previous table in to first normal form, we split the table into two tables and now we have the resulting tables: • Now first normal form is satisfied, as the columns on each table all hold just one value.
Second Normal Form v. A database is in second normal form if it satisfies the following conditions: q. It is in first normal form q. All non-key attributes are fully functional dependent on the primary key ØIn a table, if attribute B is functionally dependent on A, but is not functionally dependent on a proper subset of A, then B is considered fully functional dependent on A ØIn a Second Normal Form all non-key attributes cannot be dependent on a subset of the primary key
Note • If the primary key is not a composite key, all non-key attributes are always fully functional dependent on the primary key. • A table that is in 1 st normal form and contains only a single key as the primary key is automatically in 2 nd normal form.
2 nd Normal Form Example • Consider the following example: • This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table does not satisfy second normal form.
2 nd Normal Form Example……. . cont • To bring the previous table into second normal form, we break the table into two tables, and now we have the following: • What we have done is to remove the partial functional dependency that we initially had. Now, in the table [TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [Store ID].
Third Normal Form v. A database is in third normal form if it satisfies the following conditions: q. It is in second normal form q. There is no transitive functional dependency ØBy transitive functional dependency, we mean we have the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B.
3 rd Normal Form Example • Consider the following example: • In the table, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional dependency, and this structure does not satisfy third normal form.
3 rd Normal Form Example……. . cont • To bring the previous table to third normal form, we split the table into two as follows: • Now all non-key attributes are fully functional dependent only on the primary key. In [TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In [TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].
Example 2 • Un-normalized Table: Emp. ID Name Dept Code Dept Name Proj 1 Time Proj 1 Proj 2 Time Proj 2 Proj 3 Time Proj 3 EN 1 -26 Sean Breen TW Technical Writing 30 -T 3 25% 30 -TC 40% 31 -T 3 30% EN 1 -33 Amy Guya TW Technical Writing 30 -T 3 50% 30 -TC 35% 31 -T 3 60% EN 1 -36 Liz Roslyn AC Accounting 35 -TC 90%
Table in First Normal Form Emp. ID Project Time on Number Project Last Name First Name Dept Code Dept Name EN 1 -26 30 -T 3 25% Breen Sean TW Technical Writing EN 1 -26 30 -TC 40% Breen Sean TW Technical Writing EN 1 -26 31 -T 3 30% Breen Sean TW Technical Writing EN 1 -33 30 -T 3 50% Guya Amy TW Technical Writing EN 1 -33 30 -TC 35% Guya Amy TW Technical Writing EN 1 -33 31 -T 3 60% Guya Amy TW Technical Writing EN 1 -36 35 -TC 90% Roslyn Liz AC Accounting
Tables in Second Normal Form Table: Employees and Projects Emp. ID Project Number EN 1 -26 30 -T 3 EN 1 -26 31 -T 3 Time on Project Table: Employees Emp. ID Last Name First Name Dept Code Dept Name EN 1 -26 Breen Sean TW Technical Writing EN 1 -33 Guya Amy TW Technical Writing EN 1 -36 Roslyn Liz AC Accounting 25% 40% 30% EN 1 -33 30 -T 3 50% EN 1 -33 30 -TC 35% EN 1 -33 31 -T 3 60% EN 1 -36 35 -TC 90%
Tables in Third Normal Form Table: Employees_and_Projects Emp. ID Project Number Time on Project EN 1 -26 30 -T 3 25% EN 1 -26 30 -T 3 40% EN 1 -26 31 -T 3 30% EN 1 -33 30 -T 3 50% EN 1 -33 30 -TC 35% EN 1 -33 31 -T 3 60% EN 1 -36 35 -TC 90% Emp. ID Last Name First Name Dept Code EN 1 -26 Breen Sean TW EN 1 -33 Guya Amy TW EN 1 -36 Roslyn Liz AC Table: Departments Dept Code Dept Name TW Technical Writing AC Accounting
Example Table Violating 1 st Normal Form Rep ID Representative Client 1 Time 1 Client 2 Time 2 Client 3 TS-89 Gilroy Gladstone US Corp. 14 hrs Taggarts 26 hrs RK-56 Mary Mayhem Linkers Italiana 67 hrs Time 3 Kilroy Inc. 9 hrs 2 hrs Table in 1 st Normal Form Rep ID Rep First Name Rep Last Name Client ID* Client Time With Client TS-89 Gilroy Gladstone 978 US Corp 14 hrs TS-89 Gilroy Gladstone 665 Taggarts 26 hrs TS-89 Gilroy Gladstone 782 Kilroy Inc. 9 hrs RK-56 Mary Mayhem 221 Italiana 67 hrs RK-56 Mary Mayhem 982 Linkers 2 hrs
Tables in 2 nd and 3 rd Normal Form Rep ID* Client ID* Time With Client TS-89 978 14 hrs TS-89 665 TS-89 Rep ID* First Name Last Name TS-89 Gilroy Gladstone RK-56 Mary Mayhem 26 hrs Client ID* Client Name 782 9 hrs 978 US Corp RK-56 221 67 hrs 665 Taggarts RK-56 982 2 hrs 782 Kilroy Inc. RK-56 665 4 hrs 221 Italiana 982 Linkers
- Slides: 24