CS 143 Normalization Professor Junghoo John Cho Relational

  • Slides: 42
Download presentation
CS 143 Normalization Professor Junghoo “John” Cho

CS 143 Normalization Professor Junghoo “John” Cho

Relational Design Theory • How do we design “good” tables for a relational database?

Relational Design Theory • How do we design “good” tables for a relational database? • Typically, we start with E/R or UML and convert it into tables • Still, there are many choices to make in E/R (or UML) that lead to different tables. Which one is better? Which design should we choose? • Relational design theory (Normalization theory) • Theory on what are “good” table designs • Tries to minimize “redundancy” in table design • Algorithms that convert “bad” design into “good” design automatically 2

Student. Class Table • Q: Is this a good table design? Student. Class sid

Student. Class Table • Q: Is this a good table design? Student. Class sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 Database 04 EE 128 Microelectronics 03 3

Redundancies in Student. Class Table • The same information is included multiple times •

Redundancies in Student. Class Table • The same information is included multiple times • Redundancy leads to potential “anomalies” down the road 1. Update anomaly: Information may be updated partially and inconsistently • Q: What if a student changes the address? 2. Insertion anomaly: We may not include some information at all • Q: What if a student does not take any class? 3. Deletion anomaly: While deleting some information, we may delete others • Q: What if the only class that a student takes gets cancelled? sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept CS EE ME CS EE cnum 143 284 143 128 title Database Signal Processing Mechanics Database Microelectronics unit 04 03 05 04 03 4

Student. Class Table • Q: Is there a better table design? What table(s) will

Student. Class Table • Q: Is there a better table design? What table(s) will you use? Student. Class sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 Database 04 EE 128 Microelectronics 03 5

Coming up with Better Tables • Q: Any way to arrive at the better

Coming up with Better Tables • Q: Any way to arrive at the better design more systematically? • Q: Where is the redundancy from? Student. Class sid 301 105 207 name James Elaine James ? Susan addr 11 West 84 East ? 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 ? ? EE 128 Microelectronics 03 6

Intuition behind Normalization Theory • sid 301 105 207 name James Elaine Susan addr

Intuition behind Normalization Theory • sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept CS EE ME CS EE cnum 143 284 143 128 title Database Signal Processing Mechanics Database Microelectronics unit 04 03 05 04 03 7

“Decomposing” Student. Class Table • 8

“Decomposing” Student. Class Table • 8

Overview • Functional dependency (FD) • • • Definition Trivial functional dependency Logical implication

Overview • Functional dependency (FD) • • • Definition Trivial functional dependency Logical implication Closure FD and key • Decomposition • Lossless decomposition • Boyce-Codd Normal Form (BCNF) • Definition • BCNF decomposition algorithm • Most theoretical part of the class. Pay attention! • If you can’t follow the lecture, you are unlikely to get it by reading textbook 9

Functional Dependency

Functional Dependency

Functional Dependency (FD) • 11

Functional Dependency (FD) • 11

Functional Dependency (FD) A B C 12

Functional Dependency (FD) A B C 12

Trivial Functional Dependency • X Y X Y 13

Trivial Functional Dependency • X Y X Y 13

Logical Implication • A B C G H I 14

Logical Implication • A B C G H I 14

Closure • 15

Closure • 15

Closure X+ Computation Algorithm • 16

Closure X+ Computation Algorithm • 16

Attribute Closure Example • 17

Attribute Closure Example • 17

Functional Dependency and Key • 18

Functional Dependency and Key • 18

Projecting Functional Dependency • 19

Projecting Functional Dependency • 19

Decomposition

Decomposition

Decomposition • 21

Decomposition • 21

General Decomposition • 22

General Decomposition • 22

Lossless Decomposition • 23

Lossless Decomposition • 23

Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan

Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan • Q: Decomposition into S 1(cnum, sid), S 2(cnum, name). Lossless? S 2 S 1 cnum sid cnum name cnum sid name 143 1 143 James 143 1 James 143 2 143 Elaine 143 1 Elaine 325 3 325 Susan 143 2 James 143 2 Elaine 325 3 Susan 24

Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan

Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan • Q: Decomposition into R 1(cnum, sid), R 2(sid, name). Lossless? R 2 R 1 cnum sid name 143 1 1 James 143 2 2 Elaine 143 2 Elaine 325 3 3 Susan 325 3 Susan 25

Lossless-Join Decomposition • Q: Why is S 1(cnum, sid), S 2(cnum, name) lossy, but

Lossless-Join Decomposition • Q: Why is S 1(cnum, sid), S 2(cnum, name) lossy, but R 1(cnum, sid), R 2(sid, name) is not? S 1 S 2 R 1 R 2 cnum sid cnum name cnum sid name 143 1 143 James 143 1 1 James 143 2 143 Elaine 143 2 2 Elaine 325 3 325 Susan 325 3 3 Susan cnum sid name 143 1 James 143 1 Elaine 143 2 James 325 3 Susan 143 2 Elaine 325 3 Susan 26

Lossless-Join Decomposition • 27

Lossless-Join Decomposition • 27

Boyce-Codd Normal Form

Boyce-Codd Normal Form

FD, Key, and Redundancy • 29

FD, Key, and Redundancy • 29

Boyce-Codd Normal Form (BCNF) • 30

Boyce-Codd Normal Form (BCNF) • 30

BCNF Example (1) • 31

BCNF Example (1) • 31

BCNF Example (2) • 32

BCNF Example (2) • 32

BCNF Violation and Table Decomposition • 33

BCNF Violation and Table Decomposition • 33

BCNF Decomposition Algorithm • 34

BCNF Decomposition Algorithm • 34

BCNF Decomposition Example (1) • 35

BCNF Decomposition Example (1) • 35

BCNF Decomposition Example (1) • 36

BCNF Decomposition Example (1) • 36

BCNF Decomposition Example (2) • 37

BCNF Decomposition Example (2) • 37

Revisiting BCNF Decomposition Algorithm • 38

Revisiting BCNF Decomposition Algorithm • 38

Uniqueness of BCNF Decomposition • 39

Uniqueness of BCNF Decomposition • 39

Checking BCNF Condition • 40

Checking BCNF Condition • 40

Good Table Design in Practice • Normalization splits tables to reduce redundancy. • However,

Good Table Design in Practice • Normalization splits tables to reduce redundancy. • However, splitting tables has negative performance implication • Example: Instructor: name, office, phone, fax name → office, office → (phone, fax) • (design 1) Instructor(name, office, phone, fax) • (design 2) Instructor(name, office), Office(offce, phone, fax) • Q: Retrieve (name, office, phone) from Instructor. Which design is better? • As a rule of thumb, start with normalized tables and merge them if performance is not good enough 41

What We Learned • Relational design theory • Functional dependency • Trivial functional dependency

What We Learned • Relational design theory • Functional dependency • Trivial functional dependency • Logical implication • Closure • Decomposition • Lossless-join decomposition • Boyce-Codd Normal Form (BCNF) • BCNF decomposition algorithm • There exist other definitions of “Normal forms” • Third normal form, Fourth normal form, … • BCNF is most useful and widely used 42