Normalization Overview Earliest formalized database design technique and
Normalization
Overview • Earliest formalized database design technique and at one time was the starting point for logical database design. • Today is used more as check on database structures produces from E-R diagrams. • Data normalization process is another way of demonstrating and learning about such important topics as data redundancy, foreign key, and other ideas that are so central to a solid of database management.
• In 1972, Dr E. F. Codd developed the technique of normalization to support the design of databases based on the relational model. • Normalization is often performed as a series of tests on a table to determine whether it satisfies or violates the rules for a given normal form. • There are several normal forms, although the most commonly used ones are called first normal form (1 NF), second normal form (2 NF), and third normal form (3 NF). • All these normal forms are based on rules about relationships among the columns of a table.
Definition Normalization • A technique for producing a set of tables with desirable properties that support the requirements of a user or company. [Connolly, Thomas M. ] • A methodology for organizing attributes into tables so that redundancy among the nonkey attributes is eliminated. [Gillenson, Mark L. ] With Normalization • Each resultant tables will describe a single entity type or a single many to many relationship. • Foreign key will appear exactly where they needed. • The output of the data normalization process is a properly structured relational database.
Data redundancy and update anomalies • A major aim of relational database design is to group columns into tables to minimize data redundancy and reduce the file storage space required by the implemented base tables.
For Example: The structure of these tables is described using a Database Definition Language (DDL): • Staff (staff. No, name, position, salary, branch. No) Primary Key staff. No Foreign Key branch. No references Branch(branch. No) • Branch (branch. No, branch. Address, tel. No) Primary Key branch. No • Staff. Branch (staff. No, name, position, salary, branch. No, branch. Address, tel. No) Primary Key staff. No
The Staff. Branch table
• In the Staff. Branch table there is redundant data: – the details of a branch are repeated for every member of staff located at that branch. – In contrast, the details of each branch appear only once in the Branch table and only the branch number (branch. No) is repeated in the Staff table, to represent where each member of staff is located. • Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies.
Insertion anomalies There are two main types of insertion anomalies: 1. To insert the details of a new member of staff located at a given branch into the Staff. Branch table, we must also enter the correct details for that branch. For example, to insert the details of a new member of staff at branch B 002, we must enter the correct details of branch B 002 so that the branch details are consistent with values for branch B 002 in other records of the Staff. Branch table. 2. To insert details of a new branch that currently has no members of staff into the Staff. Branch table, it’s necessary to enter nulls into the staff-related columns, such as staff. No. However, as staff. No is the primary key for the Staff. Branch table, attempting to enter nulls for staff. No violates entity integrity, and is not allowed.
Deletion anomalies • If we delete a record from the Staff. Branch table that represents the last member of staff located at a branch, the details about that branch are also lost from the database. • For example, if we delete the record for staff Art Peters (S 0415) from the Staff. Branch table, the details relating to branch B 003 are lost from the database.
Modification anomalies • If we want to change the value of one of the columns of a particular branch in the Staff. Branch table, for example the telephone number for branch B 001, we must update the records of all staff located at that branch. • If this modification is not carried out on all the appropriate records of the Staff. Branch table, the database will become inconsistent. • In this example, branch B 001 would have different telephone numbers in different staff records.
Introduction to the Data Normalization Technique • The input required by the data normalization process : 1. A List of all attributes that must be incorporate into the database all of the attribute in all the entities involved in the business environment under discussion plus all of the intersection data attributes in all of the many to many relationship between these entities. 2. A list of all the defining associations between the attributes functional dependencies.
Functional Dependencies • A means of expressing that the value of one particular attribute is associated with a single, specific value of another attribute. • If one of these attributes has a particular value, then the other attribute must have some other value.
Example of Functional Dependencies • For a particular Salesperson number, 137, there is exactly one Salesperson Name, Baker, associated with it. • Why is this true? a Salesperson Number uniquely identifies a salesperson, and a person can have only one name true for every person! • These defining associations are written with a right-pointing arrow: Salesperson Number Salesperson Name determinant functionally dependent
First normal form (1 NF) • A table in which the intersection of every column and record contains only one value. • Only first normal form (1 NF) is critical in creating appropriate tables for relational databases. • All the subsequent normal forms are optional. • However, to avoid the update anomalies, it’s normally recommended that you proceed to third normal form (3 NF).
Converting to 1 NF • To convert this version of the Branch table to 1 NF: – create a separate table called Branch. Telephone to hold the telephone numbers of branches, by removing the tel. Nos column from the Branch table along with a copy of the primary key of the Branch table (branch. No). – The primary key for the new Branch. Telephone table is the new tel. No column. • The Branch and Branch. Telephone tables are in 1 NF as there is a single value at the intersection of every column with every record for each table.
partial dependency • Full functional dependency indicates that if A and B are columns of a table, B is fully functionally dependent on A, if B is not dependent on any subset of A. • If B is dependent on a subset of A, this is referred to as a partial dependency. • If a partial dependency exists on the primary key, the table is not in 2 NF. The partial dependency must be removed for a table to achieve 2 NF.
Second normal form (2 NF) • Definition: A table that is in first normal form and every nonprimary-key column is fully functionally dependent on the primary key. • A table that is already in 1 NF • The values in each non-primary-key column can be worked out from the values in all the columns that make up the primary key.
Second normal form (2 NF) • Second normal form applies only to tables with composite primary keys, that is tables with a primary key composed of two or more columns. • A 1 NF table with a single column primary key is automatically in at least 2 NF. • A table that is not in 2 NF may suffer from the update anomalies.
Temp. Staff. Allocation table is not in 2 NF.
Converting to 2 NF (1) • Remove the non-primary-key columns that can be worked out using only part of the primary key. Remove the columns that can be worked out from either the staff. No or the branch. No column but do not require both. • Remove the branch. Address, name, and position columns and place them in new tables. Create two new tables called Branch and Temp. Staff. – The Branch table will hold the columns describing the details of branches – The Temp. Staff table will hold the columns describing the details of temporary staff.
Converting to 2 NF (2) 1. The Branch table is created by removing the branch. Address column from the Temp. Staff. Allocation table along with a copy of the part of the primary key that the column is related to, which in this case is the branch. No column. 2. In a similar way, the Temp. Staff table is created by removing the name and position columns from the Temp. Staff. Allocation table along with a copy of the part of the primary key that the columns are related to, which in this case is the staff. No column.
Converting to 2 NF (3) • It’s not necessary to remove the hours. Per. Week column as the presence of this column in the Temp. Staff. Allocation table does not break the rules of 2 NF. • To ensure that we maintain the relationship between a temporary member of staff and the branches at which he or she works for a set number of hours leave a copy of the staff. No and branch. No columns to act as foreign keys in the Temp. Staff. Allocation table. • The primary key for the new Branch table is branch. No and the primary key for the new Temp. Staff table is staff. No. • The Temp. Staff and Branch tables must be in 2 NF because the primary key for each table is a single column. • The altered Temp. Staff. Allocation table is also in 2 NF because the non-primary-key column hours. Per. Week is related to both the staff. No and branch. No columns
Third normal form (3 NF) • A table that is already in 1 NF and 2 NF, and in which the values in all non-primarykey columns can be worked out from only the primary key column(s) and no other columns.
Transitively dependent • The formal definition for third normal form (3 NF) is a table that is in first and second normal forms and in which no non-primary-key column is transitively dependent on the primary key. • Transitive dependency is a type of functional dependency that occurs when a particular type of relationship holds between columns of a table. • For example, consider a table with columns A, B, and C. If B is functionally dependent on A (A → B) and C is functionally dependent on B (B → C), then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). • If a transitive dependency exists on the primary key, the table is not in 3 NF. • The transitive dependency must be removed for a table to achieve 3 NF.
Converting to 3 NF (1) • Remove the non-primary-key columns that can be worked out using another non-primary- key column. • remove the columns that describe the branch at which the member of staff works. • Remove the branch. Address and tel. No columns and take a copy of the branch. No column. • Create a new table called Branch to hold these columns and nominate branch. No as the primary key for this table. • The branch. Address and tel. No columns are candidate keys in the Branch table as these columns can be used to uniquely identify a given branch. • The relationship between a member of staff and the branch at which he or she works is maintained as the copy of the branch. No column in the Staff. Branch table acts as a foreign key.
Converting to 3 NF (1) • The new Branch table is in 3 NF as all of the non-primary-key columns can be worked out from the primary key, branch. No. • Although the other two non-primary-key columns in this table, branch. Address and tel. No, can also be used to work out the details of a given branch, this does not violate 3 NF because these columns are candidate keys for the Branch table. • This example illustrates that the definition for 3 NF can be generalized to include all candidate keys of a table, if any exist. • Therefore, for tables with more than one candidate key can use the generalized definition for 3 NF, which is a table that is in 1 NF and 2 NF, and in which the values in all the non-primary-key columns can be worked out from only candidate key column(s) and no other columns. • Furthermore, this generalization is also true for the definition of 2 NF, which is a table that is in 1 NF and in which the values in each nonprimary-key column can be worked out from
summary • Normalization is a technique for producing a set of tables with desirable properties that supports the requirements of a user or company. • Tables that have redundant data may have problems called update anomalies, which are classified as insertion, deletion, or modification anomalies. • The definition for first normal form (1 NF) is a table in which the intersection of every column and record contains only one value. • The definition for second normal form (2 NF) is a table that is already in 1 NF and in which the values in each non-primary-key column can be worked out from the values in all the column(s) that make up the primary key. • The definition for third normal form (3 NF) is a table that is already in 1 NF and 2 NF, and in which the values in all non-primary-key columns can be worked out from only the primary key column(s) and no other columns.
- Slides: 34