Functional Dependencies and Normalization Normalization n n Normalization






























- Slides: 30
Functional Dependencies and Normalization
Normalization n n Normalization is a formalized procedure to eliminating redundancy from data by the progressive use of ‘non-lose decomposition’, which involves splitting records without losing information. In reducing the data model to the state where each bit of information is only held in one place, the update process is much simpler, more efficient and inconsistencies in the database are impossible.
Normalization (cont. ) 5 NF 4 NF 3 NF 2 NF 1 NF Redundancy
Normalization (cont. ) n n Normalization is based on the idea that an attribute may depend on another attribute in some way. There are 2 different kinds of dependencies involved up to 5 NF n n Functional dependency Multivalued dependence
Functional Dependence S#, P# S# CITY P# QTY S 1 Khon Kaen P 1 100 S 1 Khon Kaen P 2 100 S 2 Saraburii P 1 200 S 2 Saraburii P 2 200 S 3 Saraburii P 2 300 S 4 Bangkok P 2 400 S 4 Bangkok P 4 400 S 4 Bangkok P 5 400 QTY is functionally dependent on S#and. P# S# and P# are the determinant of QTY
Functional Dependence n n In a relation including attribute A and B, B is functional dependent on A if, for every valid occurrence, the value A determines the value B An occurrence can not be used to show that a dependency is true, only that it is false n n n A and B can be composite If B is ‘Functional Dependent on’ A, then A ‘is the determinant of B’ All fields are functionally dependent on the primary key – or indeed any candidate key – be definition.
First Normal Form n A relation is in First Normal form if, and only if, it contains no multivalue or no repeating groups.
First NF (cont. ) NO Name Province Pay. Date 1 Amount 1 Pay. Date 2 E 001 Somchai Khon Kaen 15/04/2004 5, 000. 00 30/04/2004 5, 000. 00 E 002 Sompong Sarakham 15/04/2004 4, 500. 00 30/04/2004 4, 500. 00 E 003 Somchay Ubon 15/04/2004 5, 200. 00 30/04/2004 5, 200. 00 Repeat Amount 2
Problem Multi-value Staff ENO Name Dno Dept. Name Proj. No E 001 Somchai D 01 Physic P 01, P 02 NMR, Laser E 002 Sompong D 01 Physic E 003 Somchay D 02 Computer Science P 04, P 05 Voice ordering, Speech Coding E 004 Som. Siri D 02 Computer Science P 04, P 06 Voice ordering, Speech Synthesis Problem 1. Difficult to manipulate data 2. Redundancy • UPDATE ANOMALIES P 03 Proj. Name Medical Image processing
Insert P 06 Speech Corpus Staff ENO Name Dno Dept. Name Proj. No E 001 Somchai D 01 Physic P 01, P 02 NMR, Laser E 002 Sompong D 01 Physic E 003 Somchay D 02 Computer Science P 04, P 05 Voice ordering, Speech Coding E 004 Som. Siri D 02 Computer Science P 04, P 06 Voice ordering, Speech Synthesis P 03 Proj. Name Medical Image processing We can not insert new project if the project has not assigned to any employee yet.
UPDATE ANOMALIES Staff ENO Name Dno Dept. Name Proj. No E 001 Somchai D 01 Physic P 01, P 02 NMR, Laser E 002 Sompong D 01 Physic E 003 Somchay D 02 Computer Science P 04, P 05 Voice ordering, Speech Coding E 004 Som. Siri D 02 Computer Science P 04, P 06 Voice ordering, Speech Synthesis P 03 Proj. Name Medical Image processing Change Proj. Name from Voice Ordering to Speech Ordering need to change all in Database
DELETE Problem Staff ENO Name Dno Dept. Name Proj. No E 001 Somchai D 01 Physic P 01, P 02 NMR, Laser E 002 Sompong D 01 Physic E 003 Somchay D 02 Computer Science P 04, P 05 Voice ordering, Speech Coding E 004 Som. Siri D 02 Computer Science P 04, P 06 Voice ordering, Speech Synthesis P 03 Delete Employee E 003 Somchay Project P 03 Medical Image Processing was deleted also Proj. Name Medical Image processing
Solution n n Remove the repeating group In case of multi-valued n n n Create new relation Columns = Key + multi-valued Take its determinant with it
Repeating group Employee ENO Name Province Pay. Date 1 E 001 Somchai Khon Kaen 15/04/2004 5, 000. 00 30/04/2004 5, 000. 00 E 002 Sompong Sarakham 15/04/2004 4, 500. 00 30/04/2004 4, 500. 00 E 003 Somchay Ubon 15/04/2004 5, 200. 00 30/04/2004 5, 200. 00 Amount 2 Pay. Check Employee ENO Amount 1 Pay. Date 2 Name Province ENO Pay. Date Amount E 001 15/04/2004 5, 000. 00 E 001 Somchai Khon Kaen E 001 30/04/2004 5, 000. 00 E 002 Sompong Sarakham E 002 15/04/2004 4, 500. 00 E 003 Somchay Ubon E 002 30/04/2004 4, 500. 00 E 003 15/04/2004 5, 200. 00 E 003 30/04/2004 5, 200. 00
Multi-Valued Staff ENO Name E 001 Somchai E 002 Dno Dept. Name Proj. No Proj. Name D 01 Physic P 01, P 02 NMR, Laser Sompong D 01 Physic P 03 E 003 Somchay D 02 Computer Science P 04, P 05 Voice ordering, Speech Coding E 004 Som. Siri D 02 Computer Science P 04, P 06 Voice ordering, Speech Synthesis Medical Image processing
Multi-Valued ENO Name Dno Dept. Name Proj. No Proj. Name E 001 Somchai D 01 Physic P 01 NMR E 001 Somchai D 01 Physic P 02 Laser E 002 Sompong D 01 Physic P 03 Medical Image processing E 003 Somchay D 02 Computer Science P 05 Voice ordering E 003 Somchay D 02 Computer Science P 04 Speech Coding E 004 Som. Siri D 02 Computer Science P 04 Voice ordering E 004 Som. Siri D 02 Computer Science P 06 Speech Synthesis Insert Project still has problem
Second Normal Form (2 NF) n A relation is in first normal form if and only if n n It is in 1 NF Every non-key attribute is dependent on all parts of the primary key.
Staff 2 NF ? ENO Name Dno E 001 Somchai D 01 Physic P 01 NMR E 001 Somchai D 01 Physic P 02 Laser E 002 Sompong D 01 Physic P 03 Medical Image processing E 003 Somchay D 02 Computer Science P 05 Voice ordering E 003 Somchay D 02 Computer Science P 04 Speech Coding E 004 Som. Siri D 02 Computer Science P 04 Voice ordering E 004 Som. Siri D 02 Computer Science P 06 Speech Synthesis KEY = ENO + Proj. No Dept. Name Proj. No Proj. Name Answer is No. Because Proj. No is dependent on Proj. No. (not all part of Key)
Problem ENO Name Dno Dept. Name Proj. No Proj. Name E 001 Somchai D 01 Physic P 01 NMR E 001 Somchai D 01 Physic P 02 Laser E 002 Sompong D 01 Physic P 03 Medical Image processing E 003 Somchay D 02 Computer Science P 05 Voice ordering E 003 Somchay D 02 Computer Science P 04 Speech Coding E 004 Som. Siri D 02 Computer Science P 04 Voice ordering E 004 Som. Siri D 02 Computer Science P 06 Speech Synthesis We can not insert Project if have not yet assigned project to any employee
Solution n n Remove the attribute involved Take its determinant with it
Normalize ENO Name Dno Dept. Name Proj. No Proj. Name E 001 Somchai D 01 Physic P 01 NMR E 001 Somchai D 01 Physic P 02 Laser E 002 Sompong D 01 Physic P 03 Medical Image processing E 003 Somchay D 02 Computer Science P 05 Voice ordering E 003 Somchay D 02 Computer Science P 04 Speech Coding E 004 Som. Siri D 02 Computer Science P 04 Voice ordering E 004 Som. Siri D 02 Computer Science P 06 Speech Synthesis
Result Project PERSON ENO Name Dno Dept. Na me E 001 Somchai D 01 Physic E 003 Somchay D 02 Computer Science E 004 Som. Siri D 02 Computer Science Proj. Name No PERSON_Proj ENO Proj No P 01 NMR E 001 P 02 Laser E 001 P 02 P 03 Medical Image processing E 002 P 03 E 003 P 04 E 004 P 05 E 004 P 06 P 04 Speech Coding P 05 Voice ordering P 06 Speech Synthesis PERSON(ENO, NAME, Dno, Dept. Name) PROJECT(Proj. No, Proj. Name) PERSON_PROJ(ENO, Proj. No)
Third Normal Form n A relation is in 3 NF if, and only if: n n n It is in 2 NF Every non-key attribute is functionally dependent upon the key. (No non-key attribute is functional dependent on another non-key attribute) Or non-key attribute no transitive dependent on key
Transitive dependent n n R(A, B, C, D) ; A is Key, others are non- key If A → B and B → C can say A → B → C (C transitive dependent on A)
3 NF? Project PERSON ENO Name Dno Dept. Na me E 001 Somchai D 01 Physic E 003 Somchay D 02 Computer Science E 004 Som. Siri D 02 Computer Science Answer is No Because Dept. Name is dependent on Dno (has transitive dependent on key) Proj. Name No PERSON_Proj ENO Proj No P 01 NMR E 001 P 02 Laser E 001 P 02 P 03 Medical Image processing E 002 P 03 E 003 P 04 E 004 P 05 E 004 P 06 P 04 Speech Coding P 05 Voice ordering P 06 Speech Synthesis
Solution n n Remove the offending attributes Take the determinant along
Result PERSON ENO Name E 001 Somchai D 01 E 003 Somchay D 02 E 004 Som. Siri D 02 Department Dno Project Proj. Name No PERSON_Proj ENO Proj No P 01 NMR E 001 P 02 Laser E 001 P 02 P 03 Medical Image processing E 002 P 03 E 003 P 04 E 004 P 05 E 004 P 06 Dept. Na me P 04 Speech Coding D 01 Physic P 05 Voice ordering D 02 Computer Science P 06 Speech Synthesis D 02 Computer Science
Note n n n The third normal form is often reached in practice by inspection, in a single step. Its meaning seems intuitively clear; it represents a formalization of designer’s common sense. This level of normalization is widely accepted as the initial target for a design which eliminates redundancy. However, there are higher normal forms which, although less frequently invoked, highlight further redundancy problems which may affect the designer
Boyce-Codd Normal Form (BCNF) n n A relation is in BCNF if, and only if, every determinant is a candidate key. BCNF is a refinement to third normal form, and tightens its duration.
Multivalued Dependence n In a relation including attribute A, B and C, B is multivalued dependent on A if the set of B values matching a given A+C value pair, depends only on the A value.