CHAPTER 10 Functional Dependencies and Normalization for Relational






























































- Slides: 62

CHAPTER 10 Functional Dependencies and Normalization for Relational Databases

Informal Design Guidelines for Relational Databases q. Semantics of the Relation Attributes q. Reducing the redundant values in tuples q. Reducing the null values in tuples q. Disallowing the possibility of generating spurious tuples

Semantics of the Relation Attributes q. GUIDELINE 1: Each tuple in a relation should represent one entity or relationship instance. Attributes of different entities should not be mixed in the same relation. Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.

Semantics of the Relation Attributes q. Example:

Redundant Information in Tuples and Update Anomalies q. Mixing attributes of multiple entities may cause problems q. Information is stored redundantly wasting storage q. Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Redundant Information in Tuples and Update Anomalies

Redundant Information in Tuples and Update Anomalies q. Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours) q. Update Anomaly: Changing the name of project number P 1 from “Billing” to “Customer. Accounting” may cause this update to be made for all 100 employees working on project P 1. q. Insert Anomaly: Cannot insert a project unless an employee is assigned to. q. Inversely - Cannot insert an employee unless an he/she is assigned to a project

Redundant Information in Tuples and Update Anomalies q. Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Redundant Information in Tuples and Update Anomalies q. GUIDELINE 2: Guideline to Redundant Information in Tuples and Update Anomalies Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.

Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible q. Attributes that are NULL frequently could be placed in separate relations q Reasons for nulls: Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable

Spurious Tuples q. Bad designs for a relational database may result in erroneous results for certain JOIN operations q. The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

Functional dependencies q. Consider the following relation schema q{Plane}→ Start. Time q{Pilot, Start. Date, Start. Time}→ Plane q{Plane, Start. Date}→ Pilot

Functional dependencies q. A functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A 1, A 2, … An}, X and Y are subset of R. q. Definition. A functional dependency between two sets of attributes X and Y: X Y, specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t 1 and t 2 in r that have t 1[X]=t 2[X] t 1[Y] = t 2[Y].

Functional dependencies q. Example:

Functional dependencies X → Y ⇔ (t 1. X = t 2. X ⇒ t 1. Y = t 2. Y) We say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f. d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.

Functional dependencies q. Example: Social security number determines employee name: SSN -> ENAME Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS

Functional dependencies If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that X Y for any subset of attributes Y of R If X Y in R, this does not say whether or not Y X in R If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t 1[K]=t 2[K])

Functional dependencies(FDs) q. A functional dependency is a property of the semantics or meaning of the attributes. q. Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. q. A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R

Inference Rules for FDs q. Given a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F. q. The following six inference rules for functional dependencies (Armstrong's inference rules) IR 1 (reflexive : phản xạ): If X Y, then X Y. IR 2. (Augmentation) If X Y, then XZ YZ. IR 3. (Transitive) If X Y and Y Z, then X Z

Inference Rules for FDs IR 4(Decomposition): If X YZ, then X Y and X Z IR 5(Union): If X Y and X Z, then X YZ IR 6(pseudo-transitive) If X Y and WY Z, then WX Z q The last three inference rules, as well as any other inference rules, can be deduced from IR 1, IR 2, and IR 3 (completeness property)

Inference Rules for FDs q. Closure : the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+. Closure of a set of attributes X with respect to F is the set X + of all attributes that are functionally determined by X. X + can be calculated by repeatedly applying IR 1, IR 2, IR 3 using the FDs in F

Inference Rules for FDs q. Algorithm: Determining X+, the Closure of : the set of attribute X under F X+= X; repeat old. X+= X+; for each functional dependency Y Z in F do If X+ Y then X += X+ ∪Z; until (X+ = old. X+);

Inference Rules for FDs q. Example 1: Let Q be a relation with attributes (A, B, C, D, E, G, H) and let the following functional dependencies hold F={f 1: B A; f 2: DA CE; f 3: D H; f 4: GH C; f 5: AC D} q. Find the closure X+ of X = {AC}. q. Answer: X+=ACDEH

Inference Rules for FDs q. Example 2: Let Q be a relation with attributes (A, B, C, D, E, G) and let the following functional dependencies hold F = { f 1: A → C; f 2: A → EG; f 3: B → D; f 4: G → E} q. Find the closure X+ and Y+ of X = {A, B}; Y = {C, G, D} q. Answer: X+ = {ABCDEG} , Y+= {CGDE}

Inference Rules for FDs q. Example 3: F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER} HOURS} Calculate the following closure sets with respect to F; {SSN }+ = {SSN, ENAME} {PNUMBER }+ = {PNUMBER, PNAME, PLOCATION} {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PLOCATION, HOURS}

Inference Rules for FDs q. Algorithm: Determining F+ Find all subsets of relation Q+ Find all possible functional dependencies of Q Find all subset closure of Q Based on the closure of all subsets sought to determine which the functional dependencies belong F+

Inference Rules for FDs q. Example: Q(A, B, C) F = {AB C, C B} F+ ? All subset of attributes Closure of all subsets A+ = A B+ = B C+ = BC AC+ = ABC AB+ = ABC BC+ = BC A B C {A} {B} {C} {A, B} {A, C} {B, C} {A, B, C}

Inference Rules for FDs q. All functional dependencies: A BC B C AB C F C A A ABC B AC AB AC F+ C B F A C B A B BC AB BC F+ C AB C BC F+ AC BC F+ BC AC C ABC AC B F+ A AC B ABC AB ABC F C AC AC AB F+ + AC ABC F+ BC ABC BC AB

Inference Rules for FDs q. Result: F+ = { AB C, AB AC, AB BC, AB ABC, C BC, AC B, AC AB, AC BC, AC ABC}

Equivalence of Sets of FDs q. Two sets of FDs F and G are equivalent if: Every FD in F can be inferred from G, and Every FD in G can be inferred from F Hence, F and G are equivalent if F + =G + q. Definition: F covers G if every FD in G can be inferred from F (i. e. , if G + subset-of F +) F and G are equivalent if F covers G and G covers F There is an algorithm for checking equivalence of sets of FDs

Minimal Sets of FDs q A set of FDs is minimal if it satisfies the following conditions: Every dependency in F has a single attribute for its right-hand side. Example: F = {A → B, A → C , B → C, AB → D} Cannot replace any dependency X A in F with a dependency Y A, where Y is a subset of X, and still have a set of dependencies that is equivalent to F. Example: F= {AB→C; B→C}

Minimal Sets of FDs Cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. Example: F = {A→BC, B→D, AB→D} redundant because: F ≡ F’= {A→BC, B→D}

Minimal Sets of FDs q. Algorithm 10. 2: Finding a Minimal Cover (phủ tối thiểu)for a set of Functional Dependencies F Remove all FDs have left hand side redundant Converse all FDs with the right hand side of more than one attribute to FDs having the right hand side of one attribute Remove all redundant FDs

Minimal Sets of FDs q. Example: Let Q be a relation with attributes Q(A, B, C, D) and FD F={AB → CD, B → C, C → D} q Finding a Minimal Cover of F AB→CD is FD with the left hand side redundant A because B+=BCD ⇒ B → CD ∈ F+ =>F≡{B → CD; B → C; C → D} F≡{B → D; B → C; C → D}=F 1 mc , In F 1 tt , B → D is redundant. F≡{B → C; C → D}=Fmc

Minimal Sets of FDs q Example: Q(MSCD, MSSV, CD, HG) F = {MSCD → CD; CD → MSCD; CD, MSSV → HG; MSCD, HG → MSSV; MSCD, MSSV → HG} q Find minimal cover for FD F Result: Fmc = { MSCD → CD; CD → MSCD; CD, HG → MSSV; MSCD, MSSV → HG}

Normalization of Relations q. Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations q. Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Normalization of Relations q 2 NF, 3 NF, BCNF based on keys and FDs of a relation schema q 4 NF based on keys, multi-valued dependencies : MVDs; 5 NF based on keys, join dependencies : JDs (Chapter 11) q. Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Practical Use of Normal Forms q. Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties q. The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect q. The database designers need not normalize to the highest possible normal form. (usually up to 3 NF, BCNF or 4 NF) q. Denormalization: the process of storing the join of higher normal form relations as a base relation —which is in a lower normal form

Definitions of Keys and Attributes Participating in Keys q. A super key of a relation schema R = {A 1, A 2, . . , An} is a set of attributes S subset-of R with the property that no two tuples t 1 and t 2 in any legal relation state r of R will have t 1[S] = t 2[S] q. A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more.

Definitions of Keys and Attributes Participating in Keys q. Algorithm 10. 3: Finding all key for a relation Q and set of Functional Dependencies F Determine all subset of Q+: X 1, X 2, …, Xn Find closure of the Xi. Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S 1, S 2, …Sn} Create the set contain all key of Q from S by consider subset Si, Sj of S S (i ≠j), if Si Sj then remove Sj (i, j=1. . n).

Definitions of Keys and Attributes Participating in Keys q. Example:

Definitions of Keys and Attributes Participating in Keys q. Example:

Definitions of Keys and Attributes Participating in Keys q. Algorithm 10. 4: Finding one key K for a relation Q and set of Functional Dependencies F: Step 1: Assign K = Q+ Step 2: A is an attribute of K, K’=K-A. If K’+=Q+ then K=K’.

Definitions of Keys and Attributes Participating in Keys q. Example: U={A, B, C, D, E} , F={AB->C, AC->B, BC->DE} Find K Step 1: K=U > K=ABCDE Step 2: (KA)+=>(BCDE)+=BCDE ≠ U+ K=ABCDE Step 3: (KB)+=>(ACDE)+= ABCDE = U+ K=ACDE Step 4: (KC)+ =>(ADE)+ = ADE ≠ U+ K=ACDE Step 5: (KD)+ => (ACE)+ = ACEBD=U+ K=ACE Step 6: (KE)+ =>(AC)+ = ACBDE =U+ K=AC

Definitions of Keys and Attributes Participating in Keys q. If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. q. A Prime attribute (thuộc tính khóa) must be a member of some candidate key q. A Nonprime attribute is not a prime attribute— that is, it is not a member of any candidate key.

First Normal Form q. First normal form (INF): Disallow multivalued attributes, composite attributes, and their combinations. It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

First Normal Form

Second Normal Form q. Second normal form (2 NF) is based on the concept of full functional dependency. A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

Second Normal Form q. Definition. A relation schema R is in 2 NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. q. A relation schema R is in second normal form (2 NF) if every non-prime attribute A in R is fully functionally dependent on the primary key q. R can be decomposed into 2 NF relations via the process of 2 NF normalization

Second Normal Form q. Algorithm 10. 4 : The test for 2 NF Give a relational Q, a set FD F Determine Q belong 2 FN ? Step 1: Find all key of Q Step 2: With each key K, Find closure of all the subsets S of K Step 3: If there are S+ contain Nonprime attribute then Q is not 2 NF else Q is 2 NF

Second Normal Form q. Example: Q(A, B, C, D) F={AB→C; B→D; BC→A}. Is Q 2 NF? q. D is nonprime attribute Q is not 2 NF

Third Normal Form q. Third normal form (3 NF) is based on the concept of transitive dependency. q. Transitive functional dependency A functional dependency X Z that can be derived from two FDs X Y and Y Z q. Examples: SSN DMGRSSN is a transitive FD since SSN DNUMBER and DNUMBER DMGRSSN hold

Third Normal Form q Definition: A relation schema R is in 3 NF if it satisfies 2 NF and no nonprime attribute of R is transitively dependent on the primary key. q. Example:

Third Normal Form q. NOTE: In X Y and Y Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. EMP (SSN, Emp#, Salary ). SSN Emp# Salary and Emp# is a candidate key.

General Normal Form Definitions (For Multiple Keys) q. The above definitions consider the primary key only q. The following more general definitions take into account relations with multiple candidate keys q. Partial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate keys of a relation.

General Definition of 2 NF q. Definition: A relation schema R is in second normal form (2 NF) if every nonprime attribute A in R is not partially dependent on any key of R. q. Example:

General Definition of 2 NF q In LOT relation (above example), Suppose that there are two candidate keys: PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only within each county, but PROPERTY_ID numbers are unique across counties for the entire state. Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know that functional dependencies FD 1 and FD 2 hold. We choose PROPERTY_ID# as the primary key,

General Definition of 3 NF q. Definition: A relation schema R is in third normal form (3 NF) if, whenever a nontrivial functional dependency X A holds in R, either X is a super key of R, or A is a prime attribute of R.

BCNF (Boyce-Codd Normal Form) q. Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3 NF, but it was found to be stricter than 3 NF. That is, every relation in BCNF is also in 3 NF; however, a relation in 3 NF is not necessarily in BCNF. q. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X A holds in R, then X is a super key of R

BCNF (Boyce-Codd Normal Form) q. Each normal form is strictly stronger than the previous one Every 2 NF relation is in 1 NF Every 3 NF relation is in 2 NF Every BCNF relation is in 3 NF q. There exist relations that are in 3 NF but not in BCNF The goal is to have each relation in BCNF (or 3 NF)

BCNF (Boyce-Codd Normal Form) q. Example: Q(A, B, C, D, E, I) F={ACD→EBI; CE→AD}. Q is a BCNF? q. F ≡ Ftt={ACD→E, ACD→B, ACD→I, CE→A, CE→D} The feft hand side of Ftt are super key Q is BCNF

BCNF (Boyce-Codd Normal Form) q. Example: