Normalization A logical design method which minimizes data

  • Slides: 23
Download presentation
Normalization • A logical design method which minimizes data redundancy and reduces design flaws.

Normalization • A logical design method which minimizes data redundancy and reduces design flaws. • Consists of applying various “normal” forms to the database design. • The normal forms break down large tables into smaller subsets.

Anomalies Relations that have redundant data may have problems called update anomalies, which are

Anomalies Relations that have redundant data may have problems called update anomalies, which are classified as: Insertion anomalies Deletion anomalies Modification anomalies

Example of Anomalies To insert a new staff with branch. No B 007 into

Example of Anomalies To insert a new staff with branch. No B 007 into the Staff. Branch relation; To delete a tuple that represents the last member of staff located at a branch B 007; To change the address of branch B 003. Staff. Branch staff. No s. Name position salary branch. No b. Address SL 21 John White Manager 30000 B 005 22 Deer Rd, London SG 37 Ann Beech Assistant 12000 B 003 163 Main St, Glasgow SG 14 David Ford Supervisor 18000 B 003 163 Main St, Glasgow SA 9 Mary Howe Assistant 9000 B 007 16 Argyll St, Aberdeen SG 5 Susan Brand Manager 24000 B 003 163 Main St, Glasgow SL 41 Julie Lee Assistant 9000 B 005 22 Deer Rd, London

Example of Anomalies (Contd. . . ) Staff staff. No s. Name position salary

Example of Anomalies (Contd. . . ) Staff staff. No s. Name position salary brance. No SL 21 John White Manager 30000 B 005 SG 37 Ann Beech Assistant 12000 B 003 SG 14 David Ford Supervisor 18000 B 003 SA 9 Mary Howe Assistant 9000 B 007 SG 5 Susan Brand Manager 24000 B 003 SL 41 Julie Lee Assistant 9000 B 005 Branch brance. No b. Address B 005 22 Deer Rd, London B 007 16 Argyll St, Aberdeen B 003 163 Main St, Glasgow

Relationship of Normal Forms

Relationship of Normal Forms

Stages of Normalisation Remove repeating groups First normal form (1 NF) Remove partial dependencies

Stages of Normalisation Remove repeating groups First normal form (1 NF) Remove partial dependencies Second normal form (2 NF) Remove transitive dependencies Third normal form (3 NF) Boyce-Codd normal form (BCNF) Remove remaining functional dependency anomalies Remove multivalued dependencies Fourth normal form (4 NF) Remove remaining anomalies Fifth normal form (5 NF)

Normalization No transitive dependency between nonkey attributes All determinants are candidate keys - Single

Normalization No transitive dependency between nonkey attributes All determinants are candidate keys - Single multivalued dependency Boyce. Codd and Higher Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key

First Normal Form (1 NF) Each attribute must be atomic: • First Normal Form

First Normal Form (1 NF) Each attribute must be atomic: • First Normal Form is a relation in which the intersection of each row and column contains one and only one value. • No repeating columns within a row. • No multi-valued columns.

1 NF Employee (unnormalized) Employee (1 NF)

1 NF Employee (unnormalized) Employee (1 NF)

Second Normal Form (2 NF) Each attribute must be functionally dependent on the primary

Second Normal Form (2 NF) Each attribute must be functionally dependent on the primary key. • Functional dependence - the property of one or more attributes that uniquely determines the value of other attributes. • Any non-dependent attributes are moved into a smaller (subset) table. 2 NF improves data integrity. • Prevents update, insert, and delete anomalies.

Functional Dependencies Functional dependency describes the relationship between attributes in a relation. For example,

Functional Dependencies Functional dependency describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, and B is functionally dependent on A ( denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consist of one or more attributes. ) A Determinant B is functionally dependent on A B Refers to the attribute or group of attributes on the left -hand side of the arrow of a functional dependency

Functional dependency Full functional dependency indicates that if A and B are attributes of

Functional dependency Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. A functional dependency A B is partially dependent if there is some attributes that can be removed from A and the dependency still holds.

Functional Dependencies (Contd. . . ) Trival functional dependency means that the right-hand side

Functional Dependencies (Contd. . . ) Trival functional dependency means that the right-hand side is a subset ( not necessarily a proper subset) of the lefthand side. For example: staff. No, s. Name staff. No They do not provide any additional information about possible integrity constraints on the values held by these attributes. We are normally more interested in nontrivial dependencies because they represent integrity constraints for the relation.

Identifying the primary key Functional dependency is a property of the meaning or semantics

Identifying the primary key Functional dependency is a property of the meaning or semantics of the attributes in a relation. When a functional dependency is present, the dependency is specified as a constraint between the attributes. An important integrity constraint to consider first is the identification of candidate keys, one of which is selected to be the primary key for the relation using functional dependency.

Identifying the primary key (Contd. . . ) • Examples • • STORE(SNAME, ADDR,

Identifying the primary key (Contd. . . ) • Examples • • STORE(SNAME, ADDR, ZIP, ITEM, PRICE) FDs: SNAME ADDR ZIP SNAME, ITEM PRICE Finding a key: • • • SNAME does not appear in RHS, so SNAME must be a part of the key. Since SNAME ADDR ZIP, we know SNAME ADDR, ZIP But SNAME alone cannot determine any more. • • • How can we determine ITEM and PRICE? If we have ITEM, then we can determine PRICE So, SNAME, ITEM SNAME, ADDR, ZIP, ITEM, PRICE. So it satisfies the definition of the key

Inference Rules Armstrong’s axioms 1. 2. 3. 4. 5. 6. 7. Relfexivity: If B

Inference Rules Armstrong’s axioms 1. 2. 3. 4. 5. 6. 7. Relfexivity: If B is a subset of A, them A B Augmentation: If A B, then A, C B Transitivity: If A B and B C, then A C Self-determination: A A Decomposition: If A B, C then A B and A C Union: If A B and A C, then A B, C Composition: If A B and C D, then A, C B, D

Functional Dependence Employee (1 NF) Name, dept_no, and dept_name are functionally dependent on emp_no.

Functional Dependence Employee (1 NF) Name, dept_no, and dept_name are functionally dependent on emp_no. (emp_no -> name, dept_no, dept_name) Skills is not functionally dependent on emp_no since it is not unique to each emp_no.

2 NF Employee (1 NF) Employee (2 NF) Skills (2 NF)

2 NF Employee (1 NF) Employee (2 NF) Skills (2 NF)

Data Integrity Employee (1 NF) • Insert Anomaly - adding null values. eg, inserting

Data Integrity Employee (1 NF) • Insert Anomaly - adding null values. eg, inserting a new department does not require the primary key of emp_no to be added. • Update Anomaly - multiple updates for a single name change, causes performance degradation. eg, changing IT dept_name to IS • Delete Anomaly - deleting wanted information. eg, deleting the IT department removes employee Barbara Jones from the database

Third Normal Form (3 NF) Remove transitive dependencies. • Transitive dependence - two separate

Third Normal Form (3 NF) Remove transitive dependencies. • Transitive dependence - two separate entities exist within one table. • Any transitive dependencies are moved into a smaller (subset) table. 3 NF further improves data integrity. • Prevents update, insert, and delete anomalies.

Transitive Dependence Employee (2 NF) dept_name is functionally dependent on dept_no is functionally dependent

Transitive Dependence Employee (2 NF) dept_name is functionally dependent on dept_no is functionally dependent on emp_no, so via the middle step of dept_no, dept_name is functionally dependent on emp_no. (emp_no -> dept_no , dept_no -> dept_name, thus emp_no -> dept_name)

3 NF Employee (2 NF) Employee (3 NF) Department (3 NF)

3 NF Employee (2 NF) Employee (3 NF) Department (3 NF)

Other Normal Forms Boyce-Codd Normal Form (BCNF) • Strengthens 3 NF by requiring the

Other Normal Forms Boyce-Codd Normal Form (BCNF) • Strengthens 3 NF by requiring the keys in the functional dependencies to be superkeys (a column or columns that uniquely identify a row) Fourth Normal Form (4 NF) • Eliminate trivial multivalued dependencies. Fifth Normal Form (5 NF) • Eliminate dependencies not determined by keys.