NORMALIZATION Normalization l l Normalization is process of

  • Slides: 55
Download presentation
NORMALIZATION

NORMALIZATION

Normalization l l Normalization is process of removing all redundancy form database. It is

Normalization l l Normalization is process of removing all redundancy form database. It is the application of a set of simple rules called first, second & third normal form. When database design is fully normalized , there is no repletion of data across tables The advantages of data normalization can be very large in terms of storage space as well as increased efficiency with which data can be updated & maintained.

l l The rules of normalization or normal forms define exactly what kind of

l l The rules of normalization or normal forms define exactly what kind of information can be placed in each table & and how this information relates to the fields. When considering normalization some user related consideration include: • • What data should be stored in database? How will the user access the database? What privileges does the user require? How should data be grouped in database? What data most commonly accessed? How is all data related in database? What steps should be taken to ensure accurate data?

ADVANTAGES OF NORMLIZATION l l l Greater overall database organization will be gained. The

ADVANTAGES OF NORMLIZATION l l l Greater overall database organization will be gained. The amount of unnecessary redundant data reduced. Data integrity is easily maintained within the database. The database & application design processes are much for flexible. Security is easier to maintain or manage.

DISADVANTAGES OF NORMLIZATION l l The disadvantage of normalization is that it produces a

DISADVANTAGES OF NORMLIZATION l l The disadvantage of normalization is that it produces a lot of tables with a relatively small number of columns. These column then have to be joined using their primary/foreign key relation ships This has two disadvantages. • Performance: all the joins required to merge data slow processing & place additional stress on your hardware. • Complex queries: developers have to code complex queries in order to merge data from different tables.

FIRST NORMAL FORM l l The objective of first normal form is that the

FIRST NORMAL FORM l l The objective of first normal form is that the table should contain no repeating groups of data. Data is divided into logical units called entities or tables • Note: when each entity has been designed, a primary key is assigned to it. l l All attributes (column) in the entity (table) must be single valued. Repeating or multi valued attributes are moved into a separate entity (table) & a relationship is established between the two tables or entities.

l Example of the first normal form • Consider the instruction_schedule table with the

l Example of the first normal form • Consider the instruction_schedule table with the following attributes (columns) Ø instruction_schedule Ø #instructor_id Ø Fname Ø mid name Ø lname Ø year Ø semester

Ø department 1 Ø course 1 Ø section_id 1 Ø department 2 Ø course

Ø department 1 Ø course 1 Ø section_id 1 Ø department 2 Ø course 2 Ø section_id 2 Ø department 3 Ø course 3 Ø section_id 3

l l Because the instructor teaches multiple classes each semester, the class attributes and

l l Because the instructor teaches multiple classes each semester, the class attributes and its corresponding department and section attributes can take more than one value or in other words, there are multi valued attributes. The class attributes can take values class 1, class 2, class 3 & so on. This is a violation of first normal form To bring the instructor_schedule table in first normal form the department, course & section attributes are placed in a separate entity called courses_taught.

SECOND NORMAL FORM l l l The objective of second normal form is that

SECOND NORMAL FORM l l l The objective of second normal form is that every field in a table should relate to the primary key field in its entity. It means that data that is only partly dependent on the primary key is stored into another table To bring table into a second normal form, it should be first in first normal form.

l l l Now if add department _address to the course_taught table, it would

l l l Now if add department _address to the course_taught table, it would be dependent only on the department attribute not on the whole composite primary key. This is a violation in second normal form. To get the table courses_taught in second normal form, the attributes are moved into another table department

THIRD NORMAL FORM • The objective of third normal form is to remove field

THIRD NORMAL FORM • The objective of third normal form is to remove field /data in a table that is not dependent on the primary key • It means that, any non-key field in a table must relate to the primary key of the table& not to any other field. • The entity is in second normal form and non-key attributes cannot depend on another non-key attributes

ØThe primary key in this case may be either composite or single. ØAll non-key

ØThe primary key in this case may be either composite or single. ØAll non-key attributes should depend directly on the whole primary key and not on each other. ØExample consider the employee entity/table: employee #emp_id last_name first_name mid_name category

l l l In above table employee entity has an attributes called category Category

l l l In above table employee entity has an attributes called category Category has initially values technical, management, administration, proffessional. Let’s, we add an attribute called category_level. Which have following values: • for beginner • for middle level • for an expert

ØSo our entity becomes l l l l Employee #emp_id last_name first_name mid_name category_level

ØSo our entity becomes l l l l Employee #emp_id last_name first_name mid_name category_level

l l But here, category _level is only dependent on category. And category is

l l But here, category _level is only dependent on category. And category is dependent on emp_id attributes (primary_key). It is means that, the attributes category_level is dependent on the primary key emp_id through the category attributes but not directly. An attribute dependency on primary key which is not directly but only passes through another attributes that is dependent on primary key is called transitive dependency. This is a violation of third normal form

l l To solve this violation, category & category_level from employee entity are moved

l l To solve this violation, category & category_level from employee entity are moved to another entity. So, we have

Boyce-Codd Normal Form (BCNF) l l l To eliminate the problems and redundancy of

Boyce-Codd Normal Form (BCNF) l l l To eliminate the problems and redundancy of 3 NF, R. F Boyce proposed a normal form known as Boyce-Codd normal form (BCNF). Relation R is said to be in BCNF if for every nontrivial FD: X Y between attributes X and Y holds in R. That means: • X is super key of R, • X Y is a trivial FD, that is, YX

l l A relation is in BCNF, if and only if, every determinant is

l l A relation is in BCNF, if and only if, every determinant is a candidate key. The difference between 3 NF and BCNF is that for a functional dependency A B, 3 NF allows this dependency in a relation if B is a primarykey attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key.

Client. No interview. Date interview. Time staff. No room. No CR 76 13 -May-02

Client. No interview. Date interview. Time staff. No room. No CR 76 13 -May-02 10. 30 SG 5 G 101 CR 76 13 -May-02 12. 00 SG 5 G 101 CR 74 13 -May-02 12. 00 SG 37 G 102 CR 56 1 -Jul-02 10. 30 SG 5 G 102

l l l l FD 1 client. No, interview. Date interview. Time, staff. No,

l l l l FD 1 client. No, interview. Date interview. Time, staff. No, room. No (Primary Key) FD 2 staff. No, interview. Date, interview. Time client. No (Candidate key) FD 3 room. No, interview. Date, interview. Time client. No, staff. No (Candidate key) FD 4 staff. No, interview. Date room. No (not a candidate key)

l l As a consequece the Client. Interview relation may suffer from update anmalies.

l l As a consequece the Client. Interview relation may suffer from update anmalies. For example, two tuples have to be updated if the room. No need be changed for staff. No SG 5 on the 13 -May-02.

MULTI-VALUED DEPENDENCIES (MVD) AND FOURTH NORMAL FORM (4 NF) l l l A multi-valued

MULTI-VALUED DEPENDENCIES (MVD) AND FOURTH NORMAL FORM (4 NF) l l l A multi-valued dependency (MVD) is a functional dependency where the dependency may be to a set and not just a single value. It is defined as X Y in relation R (X, Y, Z), if each X value is associated with a set of Y values in a way that does not depend on the Z values. Here X and Y are both subsets of R.

l The notation X Y is used to indicate that a set of attributes

l The notation X Y is used to indicate that a set of attributes of Y shows a multi-valued dependency (MVD) on a set of attributes of X l l Example of a database of teaching courses, the books recommended for the course, and the lecturers who will be teaching the course:

Ø Because the lecturers attached to the course and the books attached to the

Ø Because the lecturers attached to the course and the books attached to the course are independent of each other, this database design has a multivalued dependency; if we were to add a new book to the DBMS course, we would have to add one record for each of the lecturers on that course, and vice versa. Ø Put formally, there are two multivalued dependencies in this relation: {course} {book} and equivalently {course} {lecturer}.

l l Databases with multivalued dependencies thus exhibit redundancy. In database normalization, fourth normal

l l Databases with multivalued dependencies thus exhibit redundancy. In database normalization, fourth normal form requires that for every nontrivial multivalued dependency X Y, X is a superkey.

Properties of MVDs

Properties of MVDs

l l These four axioms can be used to derive the closure of a

l l These four axioms can be used to derive the closure of a set D+, of D of multi-valued dependencies. It can be noticed that there are similarities between the Armstrong's axioms for FDs and Berri's axioms for MVDs. Both have reflexivity, augmentation, and transitivity rules. But, the MVD set also has a complementation rule.

Fourth Normal Form (4 NF) l l l A relation R is said to

Fourth Normal Form (4 NF) l l l A relation R is said to be in fourth normal form (4 NF) if it is in BCNF and for every non-trivial MVD(X Y) in F+, X is a super key for R. The fourth normal form (4 NF) is concerned with dependencies between the elements of compound keys composed of three or more attributes. The 4 NF eliminates the problems of 3 NF. 4 NF is violated when a relation has undesirable MVDs and hence can be used to identify and decompose such relations

l l Relation EMPLOYEE, as shown in Fig. A tuple in this relation represents

l l Relation EMPLOYEE, as shown in Fig. A tuple in this relation represents the fact that an employee (EMP-NAME) works on the project (PROJ-NAME) and has a dependent (DEPENDENT-NAME). This relation is not in 4 NF because in the nontrivial MVDs EMP-NAME PROJ-NAME And EMP-NAME DEPENDENT-NAME, EMP-NAME is not a super key of EMPLOYEE.

l l Now the relation EMPLOYEE is decomposed into EMP_PROJ and EMP_DEPENDENTS. Thus, both

l l Now the relation EMPLOYEE is decomposed into EMP_PROJ and EMP_DEPENDENTS. Thus, both EMP_PROJ and EMP_DEPENDENT are in 4 NF, because the MVDs EMP-NAME PROJ-NAME in EMP_PROJ

l l And EMP-NAME DEPENDENT-NAME in EMP_DEPENDENTS are trivial MVDs. No other non-trivial MVDs

l l And EMP-NAME DEPENDENT-NAME in EMP_DEPENDENTS are trivial MVDs. No other non-trivial MVDs hold in either EMP_PROJ or EMP_DEPENDENTS. No FDs hold in these relation schemas either.

Problems with MVDs and 4 NF l l l FDs, MVDs and 4 NF

Problems with MVDs and 4 NF l l l FDs, MVDs and 4 NF are not sufficient to identify all data redundancies. Let us consider a relation PERSONS_ON_JOB_SKILLS, as shown in Table. This relation stores information about people applying all their skills to the jobs to which they are assigned. But, they use particular or all skills only when the job needs that skill

l The relation PERSONS_ON_JOB_SKILLS of Table is in BCNF and 4 NF. l For

l The relation PERSONS_ON_JOB_SKILLS of Table is in BCNF and 4 NF. l For example, person "Thomas" who possesses skills "Analyst" and "DBA" applies them to job J 2, as J-2 needs both these skills.

l l The same person "Thomas" applies skill "Analyst" only to job J-1, as

l l The same person "Thomas" applies skill "Analyst" only to job J-1, as job J-1 needs only skill "Analyst" and not skill "DBA". Thus, if we delete <Thomas, DBA, J-2>, we must also delete <Thomas, Analyst, J-2>, because persons must apply all their skills to a job if that requires those skills.

JOIN DEPENDENCIES AND FIFTH NORMAL FORM (5 NF) Ø The anomalies of MVDs and

JOIN DEPENDENCIES AND FIFTH NORMAL FORM (5 NF) Ø The anomalies of MVDs and are eliminated by join dependency (JD) and 5 NF.

Join Dependencies (JD) Ø A join dependency (JD) can be said to exist if

Join Dependencies (JD) Ø A join dependency (JD) can be said to exist if the join of R 1 and R 2 over C is equal to relation R. Ø Ø Where, R 1 and R 2 are the decompositions R 1(A, B, C), and R 2 (C, D) of a given relations R (A, B, C, D). Ø R 1 and R 2 is a lossless decomposition of R.

Ø In other words, *(A, B, C, D), (C, D) will be a join

Ø In other words, *(A, B, C, D), (C, D) will be a join dependency of R if the join of the join's attributes is equal to relation R. Ø *( R 1, R 2, R 3. . ) indicates that relations R 1, R 2, R 3 and soon are a join dependency (JD) of R. Ø Relation R to satisfy a JD *( R 1, R 2, …. , RN) is that Ø R = R 1 U R 2 U …. . U RN.

Ø relation PERSONS_ON_JOB_SKILLS, as shown in above fig. Ø This relation can be decomposed

Ø relation PERSONS_ON_JOB_SKILLS, as shown in above fig. Ø This relation can be decomposed into three relations namely, HAS_SKILL, NEEDS_SKILL cind ASSIGNED_TO_JOBS. Ø Fig. the join dependencies of decomposed relations. Ø if we join decomposed relations HAS_SKILL and NEEDS_SKILL, a relation CAN_USE_JOB_SKILL is obtained, as shown in above Fig.

Ø This relation stores the data about persons who have skills applicable to a

Ø This relation stores the data about persons who have skills applicable to a particular job. But, each person who has a skill required for a particular job need not be assigned to that job. Ø Thus, redundant tuples (rows) that show unnecessary SKILL-TYPE and JOB combinations are removed by joining with relation NEEDS_SKILL.

Fifth Normal Form (5 NF) Ø A relation is said to be in fifth

Fifth Normal Form (5 NF) Ø A relation is said to be in fifth normal form (5 NF) if every join dependency is a consequence of its relation (candidate) keys. Ø For every non-trivial join dependency *(R 1 R 2 R 3) each decomposed relation Ri is a super key of the main relation R. Ø 5 NF is also called project-join normal form (PJNM).

Ø There are some relations, who cannot be decomposed into two or higher normal

Ø There are some relations, who cannot be decomposed into two or higher normal form relations by means of projections as discussed in 1 NF, 2 NF, 3 NF and BCNF. Ø Relations are decomposed into three or more relations, which can be reconstructed by means of a three-way or more join operation. This is called fifth normal form (5 NF).

Ø Any relation that is in 5 NF, is also in other normal forms

Ø Any relation that is in 5 NF, is also in other normal forms namely 2 NF, 3 NF and 4 NF. Ø 5 NF is mainly USED FROM theoretical point of view and not for practical database design

Ø Example 1 Ø LET US CONSIDER THE RELATION PER SONS_ON_JOB_S KILLS OF FIG.

Ø Example 1 Ø LET US CONSIDER THE RELATION PER SONS_ON_JOB_S KILLS OF FIG. Ø THE THREE RELATIONS ARE l l l HAS_SKILL (PERSON, SKILL-TYPE) NEEDS_SKILL (SKILL-TYPE, JOB) JOB_ASSIGNED (PERSON, JOB))

Ø NOW BY APPLYING THE DEFINITION OF 5 NF, THE JOIN DEPENDENCY IS GIVEN

Ø NOW BY APPLYING THE DEFINITION OF 5 NF, THE JOIN DEPENDENCY IS GIVEN AS: Ø *((PERSON, SKILL-TYPE), (SKILL-TYPE, JOB), (PERSON, JOB)) Ø THE ABOVE STATEMENT IS TRUE BECAUSE A JOIN RELATION OF THESE THREE RELATIONS IS EQUAL TO THE ORIGINAL RELATION

Ø THE CONSEQUENCE OF THESE JOIN DEPENDENCIES IS THAT THE SKILLTYPE, JOB OR PERSON,

Ø THE CONSEQUENCE OF THESE JOIN DEPENDENCIES IS THAT THE SKILLTYPE, JOB OR PERSON, IS NOT RELATION KEY, AND HENCE THE RELATION IS NOT IN 5 NF. Ø NOW SUPPOSE, THE SECOND TUPLE (ROW 2) IS REMOVED FORM RELATION PERSONS_ON_JOB_SKILLS, A NEW RELATION IS CREATED THAT NO LONGER HAS ANY JOIN DEPENDENCIES.