Looking Back and Thinking Forward 1 Rethinking Dependency

Looking Back and Thinking Forward 1

Rethinking Dependency Preservation 2

Definition of Dependency Preservation • Let R be a relation that is decomposed into R 1, …, Rn • Let F be a set of FDs over R • The projection of F on Ri (denoted as FRi ) is the set of FDs in F+ that involve only attributes in Ri. • The decomposition is dependency preserving if (FR 1[ … [ FRn)+ = F+ 3

Why is Dependency Preservation Useful? • If the decomposition to R 1, …, Rn preserves all dependencies, then – when inserting into Ri, we check that we are not contradicting any dependency in FRi – If this is the case, then the join is guaranteed not to contradict F • We don’t need to compute a join on insert! (or do we…) 4

Easy and Hard • It is easy (polynomial) to check if a decomposition is dependency preserving – Remember the algorithm with Z: = Z ((Z Ri)+ R i) • It is hard to check that we are not contradicting any dependency in FRi, when inserting into Ri • Remember that computing FRi is an exponential problem! 5

Conclusion • The whole goal of a decomposition that preserves dependencies is to make insertion efficient (i. e. , by avoiding the join computation on insert) • However, even if we know that the decomposition preserves dependencies, this does not mean that we can efficiently insert! • On the other hand, (1) computing FRi can be more efficient than computing a join, if the tables are large; (2) We only have to compute Fri once 6

Is Dependency Preservation Useful? • Suppose we have (ABCD) with A B, B D, A C, C D • A dependency preserving decomposition: (AB), (BD), (AC), (CD) • Observe: – We know FAB, FBD, FAC, FCD, – Decomposition is lossless – All schemas are in BCNF • Sounds great, right? Wrong! 7

Consider the following series of legal inserts into the tables A B, B D, A C, C D A B B D A C C D 8

Consider the following series of legal inserts into the tables A B, B D, A C, C D A 1 B 2 B D A C C D 9

Consider the following series of legal inserts into the tables A B, B D, A C, C D A 1 B 2 D 3 A C C D 10

Consider the following series of legal inserts into the tables A B, B D, A C, C D A 1 B 2 D 3 A 1 C 4 C D 11

Consider the following series of legal inserts into the tables A B, B D, A C, C D A 1 B 2 D 3 A 1 C 4 D 5 12

What is in the natural join of the relations? A A 1 B 2 D 3 A 1 C 4 D 5 B C D 13

Conclusion • The natural join of the result is guaranteed to contain no contradictions to the functional dependencies • However, some tuples may be lost during the process of computing a natural join – since they contain contradictions! 14

Conclusion (cont) • Suppose someone asks a query involving only (AB) • Even though AB appear together in a schema, we have to compute a natural join of all schemas before answering the query! • Even if the decomposition is dependency preserving, we have to either: – Compute a natural join before querying or – Compute a natural join before inserting • Stronger notations than dependency preservation have been developed that can avoid joins 15

Thoughts about the 3 NF Decomposition Algorithm 16

The Algorithm for Computing a Decomposition to 3 NF • Given a relation R with functional dependencies F • Step 1: Find a minimal cover G of F • Step 2: For each FD X A in G, create a schema XA • Step 3: If no schema created so far contains a key, add a key as a schema • Step 4: Remove schemas that are contained in other schemas • The result is a decomposition into 3 NF that is dependency preserving and has a lossless join 17

Rethinking Step 2: For each FD X A in G, create a schema XA • Suppose we have (ABC) with A BC. • The minimal cover: A B, A C • The decomposition (AB), (AC) • An alternative option (algorithm remains correct): after finding a minimal cover, combine functional dependencies that have the same left side. Then create a schema from each of these dependencies. • What would be get in this case? 18

Rethinking Step 2 a: Combine dependencies with the same left hand Step 2 b: For each FD X Y create a schema XY • What about more complicated cases? • (ABCD) with A B, B A, A C, B D • Might want to combine all four dependencies since A and B are equivalent • The theory for this has been developed, not discussed further in course. 19

Rethinking Step 4: Remove schemas that are contained in other schemas • Suppose we have (ABC) with AB C, C B. • The minimal cover: AB C, C B • We create the decomposition (ABC), (CB) • Step 4 removes CB 20

Rethinking Step 4: Remove schemas that are contained in other schemas • From (ABC) with AB C, C B, we end up with the decomposition (ABC) • What happens if we want to store values for CB, and we do not know A? – We cannot store this information, since A is part of a key and cannot be null!!! • Conclusion: Think twice before removing contained schemas 21

Do we always get the best result? • Suppose we have (ABC) with A B, B C, C A • What will the algorithm for decomposing to 3 NF give us? • But the original relation was in BCNF! • Clearly ABC should be together in the same relation! 22

Other Types of Redundancy 23

Other Types of Redundancy • There are types of redundancy in the real world that are not properly modeled by functional dependencies • Choosing a decomposition only based on functional dependencies will leave this redundancy in our relations 24

Example • Suppose we have Students, Courses, Teachers • Students can study several courses • Courses can have several teachers • All teachers of the course teach the students of the course, e. g. , as happens in the DB course: – all students study with both teachers • What functional dependencies hold in (Students, Courses, Teachers)? • What is the normal form of the relation? • Is there redundancy in the relation? 25

Example (cont) • The red data is redundant, even though the relation is in BCNF!! • This type of a dependency is called a multivalued dependency, and it can cause redundancy Students Sam Mike Courses DB DB DB Teachers Sara Yehoshua Sara Mike DB Yehoshua 26

Solution • Decompose the table into two tables • Without knowing the multivalued dependency, this decomposition seems lossy • With the multivalued dependency, this decomposition is lossless Students Sam Mike Courses DB DB Teachers Sara Yehoshua 27

Conclusion • When designing tables, the result of the 3 NF decomposition algorithm may not be decomposed enough • We must also take multi-valued dependencies into consideration • Details in the course book… 28

Applying Design Theory To Real World Problems Uses and Limitations 29

So Far… • Given a relation, we know how to: – Check if it in BCNF or 3 NF – Decompose to BCNF with a lossless decomposition – Decompose to 3 NF with a lossless and dependency preserving decomposition • Practically speaking, how do we use this? 30

Option 1 • Use entity relationship diagrams to model your data • Translate the diagram into tables • Identify functional dependencies that should hold over tables • Decompose when necessary 31

city Example name street number zipcode Child Studies. At School phone name • What dependencies will hold in the relation for Child? • What is the normal form of the relation for Child? 32

Option 2 • Do not create an entity relationship diagram • Instead, identify all the attributes existing in the world being modeled • Image these as all being together in one large table (called the universal table) • Define all functional dependencies that should hold over all the entities and attributes • Decompose 33

Example • A university database contains info about professors (identified by teudat zehut) and courses (identified by courseid). Professors teach courses. • Every professor teaches exactly one course during a semester • Each course has a single professor, each time it is given • We store historical information about who taught courses during previous semesters • Model this using the method suggested on the previous slide. 34

Does this Method for Table Design Always Work? • Suppose that there are – employees – projects – departments – managers • Each employee belongs to a single department • Each employee works on a single project • Each department has a single manager • Each project has a single manager 35

Modeling Attempt 1 • The universal table: – (emp, proj, dept, mgr) • Functional dependencies holding: – emp proj – emp dept – proj mgr – dept mgr • Does this correctly model the given information? 36

Implied Dependencies • emp proj, proj mgr • emp dept, dept mgr • These imply that emp mgr • This was not part of the given information • What if employees work on a single project, but that project might not be in their department? • Then, employees have a department manager, and a project manager (who may be different!) 37

The Problem • If the same dependency can be derived in different ways, it must have the same meaning, otherwise we are modeling incorrectly • The only way to interpret our model is that an employee has the same project manager as his department manager – This was not part of the given • Source of the problem was cyclicity of the “dependency graph” E P D M 38

Modeling Attempt 2 • Break the cyclicity. There are 2 types of managers • The universal table: – (emp, proj, dept, pmgr, dmgr) E P PM D DM • Functional dependencies holding: – emp proj – emp dept – proj pmgr – dept dmgr • Does this model all the information? 39

Missing Information • Modeling attempt 2 does not make explicit the ISA relationships, i. e. , that – a pmgr is a manager and – a dmgr is a manager • What happens if we add the mgr attribute back in? 40

Modeling Attempt 3 • Add ISA information • The universal table: E – (emp, proj, dept, pmgr, dmgr, mgr) P PM D DM M • Functional dependencies holding: – emp proj, emp dept – proj pmgr, dept dmgr – pmgr mgr, dmgr • Does this correctly model the given information? • Do you see the problem here? 41

The Solution 1. Start by identifying all attributes that should be modeled 2. Then, identify all functional dependencies 3. If the functional dependencies form cycles, identify which implications are wrong, i. e. , where there are 2 paths that lead to the same attribute, but in reality may lead to different values 4. From 3, discover which attributes have 2 meanings and should be split. Split these 42

The Solution (cont) 5. Identify ISA relationships. 6. Do not model the functional dependencies implied by the ISA relationships. Instead, model ISA relationships using foreign key constraints 43

Modeling Attempt 4 • The universal table: – (emp, proj, dept, pmgr, dmgr) E P PM D DM • Functional dependencies holding: – emp proj, emp dept – proj pmgr, dept dmgr • Table is not in 3 NF. Decompose using algorithm from class: – (emp, proj), (emp, dept), (proj, pmgr), (dept, dmgr) • Add ISA information using foreign keys (mgr)44