NORMALIZATION FIRST NORMAL FORM 1 NF A relation

  • Slides: 23
Download presentation
NORMALIZATION FIRST NORMAL FORM (1 NF): A relation R is in 1 NF if

NORMALIZATION FIRST NORMAL FORM (1 NF): A relation R is in 1 NF if all attributes have atomic value = one value for an attribute = no repeating groups = no multivalued attributes = no composite attributes

Example Non-1 NF EMP (E#, ENAME, SKILL). Here SKILL is a multi-valued attribute. EMP(

Example Non-1 NF EMP (E#, ENAME, SKILL). Here SKILL is a multi-valued attribute. EMP( E#, ENAME, SKILL 1, SKILL 2, SKILL 3, SKILL 4, . . ). Skill as a repeating group attribute.

NON-1 NF There are two methods of converting a NON 1 NF into a

NON-1 NF There are two methods of converting a NON 1 NF into a 1 NF relation. Method 1 mapps out the multi-valued (or repeating group) attribute into another table, while method 2 keeps the multi-valued attribute but simply uses a composite PK.

Method 1: Conversion to 1 NF 1. Create one relation for repeating groups by

Method 1: Conversion to 1 NF 1. Create one relation for repeating groups by adding the key of original relation. 2. Remove the attributes of repeating groups from the original relation.

Example SKILL (E#, SKILL) EMP (E#, ENAME) Note the composite PK of SKILL relation.

Example SKILL (E#, SKILL) EMP (E#, ENAME) Note the composite PK of SKILL relation.

Method 2: We can also flatten the table as follows: EMP (E#, Skill, Ename)

Method 2: We can also flatten the table as follows: EMP (E#, Skill, Ename) - Elmasri's book uses this method. - This method repeats the repeating group value in a separate tuple. - Note the composite PK.

What are advantages and disadvantages of the two methods? DEPT (D#, DNAME, DMGRSSN, DLOC),

What are advantages and disadvantages of the two methods? DEPT (D#, DNAME, DMGRSSN, DLOC), where DLOC is a multi-valued attribute EMP_PROJ (SSN, ENAME, PROJS(PNUMBER, HOURS)), where PROJS is a composite attribute. Typically, most relational database systems assume your DB should be 1 NF

SECOND NORMAL FORM (2 NF) A relation R is in 2 NF if (a)

SECOND NORMAL FORM (2 NF) A relation R is in 2 NF if (a) R is in 1 NF, and (b) each attribute is fully functionally dependent on the whole key of R

Example INVENTORY (WH, PART, QTY, WH_ADDR) WH, PART --> QTY WH --> WH_ADDR (This

Example INVENTORY (WH, PART, QTY, WH_ADDR) WH, PART --> QTY WH --> WH_ADDR (This is not in 2 NF, since WH is a part of a key) Key: WH+PART

Problem of non-2 NF (update anomaly): • Warehouse address is repeated for every part

Problem of non-2 NF (update anomaly): • Warehouse address is repeated for every part stored • If the address is changed, needs multiple updates • If no parts in a warehouse, can't keep the warehouse address

2 NF Decomposition: • Create a separate relation for each PD • Remove the

2 NF Decomposition: • Create a separate relation for each PD • Remove the RHS of the PD from the original relation.

 • The above Non-2 NF can be transformed into the following 2 NF

• The above Non-2 NF can be transformed into the following 2 NF relations. INVENTORY(WH, PART, QTY) WAREHOUSE(WH, WH_ADDR)

Example • NOTE that Non-2 NF occurs only when we have a composite key.

Example • NOTE that Non-2 NF occurs only when we have a composite key. • EMP_PROJ (SSN, P#, HOURS, ENAME, PLOC) SSN, P# --> HOURS SSN --> ENAME (* Violate 2 NF; SSN is a part of a key*) P# --> PNAME, PLOC (* Violate 2 NF; P# is a part of a key *)

2 NF decomposition R 1 (SSN, P#, HOURS) R 2 (SSN, ENAME) R 3

2 NF decomposition R 1 (SSN, P#, HOURS) R 2 (SSN, ENAME) R 3 (P#, PNAME, PLOC)

THIRD NORMAL FORM (3 NF) A relation R is in 3 NF if a)

THIRD NORMAL FORM (3 NF) A relation R is in 3 NF if a) it is in 2 NF and b) it has no transitive dependencies. That is, each nonkey attribute must be functionally dependent on the key and nothing else. If you have any FD whose LHS is not a PK (or CK), then R is not in 3 NF.

Example WORK (EMP#, DEPT, LOC) 2 NF (1) EMP# --> DEPT Y (2) DEPT

Example WORK (EMP#, DEPT, LOC) 2 NF (1) EMP# --> DEPT Y (2) DEPT --> LOC Y KEY: EMP# 3 NF Y N WORK is in 2 NF, but not in 3 NF because of FD (2).

Problem of Non-3 NF • Dept. location is repeated for every employee • If

Problem of Non-3 NF • Dept. location is repeated for every employee • If the location is changed, needs multiple updates • If you forget to change all records, can cause inconsistency • If a dept. has no employees, can't keep dept location

3 NF DECOMPOSITION Algorithm for a given minimal cover: 1) Combine the RHS of

3 NF DECOMPOSITION Algorithm for a given minimal cover: 1) Combine the RHS of FDs if they have common LHS 2) Create a separate table for each FD. 3) Check for Lossless decomposition. (Check whether a CK of the original realtion appears in any of the decomposed relation). IF not lossless, then add a table consisting of a CK.

Example • R 1 (EMP#, DEPT), R 2 (DEPT, LOC) • The original relation

Example • R 1 (EMP#, DEPT), R 2 (DEPT, LOC) • The original relation WORK is not in 3 NF, but R 1 and R 2 are in 3 NF. • Note that the LHS of a FD becomes the PK of each decomposed table.

 • Our 3 NF definition we used above is an informal one used

• Our 3 NF definition we used above is an informal one used by many industry designers. Some DB text books, including Elmasri's book use a more rigorous definition that is shown below.

Formal Def. of 3 NF A relation R is in 3 NF if, for

Formal Def. of 3 NF A relation R is in 3 NF if, for all X --> A in R (1) X is a super key or (2) A is a prime attribute (where X and A could be a set of attributes) In other words, all attributes, except prime attributes, must be dependent on any candidate keys.

 • The only difference between the informal definition and the formal definition is

• The only difference between the informal definition and the formal definition is the secondition in the formal definition. That is, the formal definition allows transitive dependency whose RHS is a prime attribute, where a prime attribute is an attribute that belongs to any candidate key. The difference between these two definition is very minor and many real-world DB designers just use the informal definition. For you reference, we showed the formal definition of 3 NF.

SUMMARY OF NORMALIZATION - As we go to higher normal forms, we create a

SUMMARY OF NORMALIZATION - As we go to higher normal forms, we create a more number of relations. - Each higher normal form removes a certain type of dependency that causes redundancy. - As a relation becomes a higher normal form: - We have a more number of relations - That increases more number of joins in query forming - Which increases more number of join processings - And also more referential integrity constraints need to be maintained - And thus schema is complicated and performance is drcreased. So, many real-world DB designers stop at 3 NF, which reasonably removes typical redundnacy and still maintains performance. So, strive to achive 3 NF in your real-world RDB!