CS 143 Normalization Professor Junghoo John Cho Relational
- Slides: 42
CS 143 Normalization Professor Junghoo “John” Cho
Relational Design Theory • How do we design “good” tables for a relational database? • Typically, we start with E/R or UML and convert it into tables • Still, there are many choices to make in E/R (or UML) that lead to different tables. Which one is better? Which design should we choose? • Relational design theory (Normalization theory) • Theory on what are “good” table designs • Tries to minimize “redundancy” in table design • Algorithms that convert “bad” design into “good” design automatically 2
Student. Class Table • Q: Is this a good table design? Student. Class sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 Database 04 EE 128 Microelectronics 03 3
Redundancies in Student. Class Table • The same information is included multiple times • Redundancy leads to potential “anomalies” down the road 1. Update anomaly: Information may be updated partially and inconsistently • Q: What if a student changes the address? 2. Insertion anomaly: We may not include some information at all • Q: What if a student does not take any class? 3. Deletion anomaly: While deleting some information, we may delete others • Q: What if the only class that a student takes gets cancelled? sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept CS EE ME CS EE cnum 143 284 143 128 title Database Signal Processing Mechanics Database Microelectronics unit 04 03 05 04 03 4
Student. Class Table • Q: Is there a better table design? What table(s) will you use? Student. Class sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 Database 04 EE 128 Microelectronics 03 5
Coming up with Better Tables • Q: Any way to arrive at the better design more systematically? • Q: Where is the redundancy from? Student. Class sid 301 105 207 name James Elaine James ? Susan addr 11 West 84 East ? 84 East 12 North dept cnum title unit CS 143 Database 04 EE 284 Signal Processing 03 ME 143 Mechanics 05 CS 143 ? ? EE 128 Microelectronics 03 6
Intuition behind Normalization Theory • sid 301 105 207 name James Elaine Susan addr 11 West 84 East 12 North dept CS EE ME CS EE cnum 143 284 143 128 title Database Signal Processing Mechanics Database Microelectronics unit 04 03 05 04 03 7
“Decomposing” Student. Class Table • 8
Overview • Functional dependency (FD) • • • Definition Trivial functional dependency Logical implication Closure FD and key • Decomposition • Lossless decomposition • Boyce-Codd Normal Form (BCNF) • Definition • BCNF decomposition algorithm • Most theoretical part of the class. Pay attention! • If you can’t follow the lecture, you are unlikely to get it by reading textbook 9
Functional Dependency
Functional Dependency (FD) • 11
Functional Dependency (FD) A B C 12
Trivial Functional Dependency • X Y X Y 13
Logical Implication • A B C G H I 14
Closure • 15
Closure X+ Computation Algorithm • 16
Attribute Closure Example • 17
Functional Dependency and Key • 18
Projecting Functional Dependency • 19
Decomposition
Decomposition • 21
General Decomposition • 22
Lossless Decomposition • 23
Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan • Q: Decomposition into S 1(cnum, sid), S 2(cnum, name). Lossless? S 2 S 1 cnum sid cnum name cnum sid name 143 1 143 James 143 1 James 143 2 143 Elaine 143 1 Elaine 325 3 325 Susan 143 2 James 143 2 Elaine 325 3 Susan 24
Lossless-Join Decomposition cnum sid name 143 1 James 143 2 Elaine 325 3 Susan • Q: Decomposition into R 1(cnum, sid), R 2(sid, name). Lossless? R 2 R 1 cnum sid name 143 1 1 James 143 2 2 Elaine 143 2 Elaine 325 3 3 Susan 325 3 Susan 25
Lossless-Join Decomposition • Q: Why is S 1(cnum, sid), S 2(cnum, name) lossy, but R 1(cnum, sid), R 2(sid, name) is not? S 1 S 2 R 1 R 2 cnum sid cnum name cnum sid name 143 1 143 James 143 1 1 James 143 2 143 Elaine 143 2 2 Elaine 325 3 325 Susan 325 3 3 Susan cnum sid name 143 1 James 143 1 Elaine 143 2 James 325 3 Susan 143 2 Elaine 325 3 Susan 26
Lossless-Join Decomposition • 27
Boyce-Codd Normal Form
FD, Key, and Redundancy • 29
Boyce-Codd Normal Form (BCNF) • 30
BCNF Example (1) • 31
BCNF Example (2) • 32
BCNF Violation and Table Decomposition • 33
BCNF Decomposition Algorithm • 34
BCNF Decomposition Example (1) • 35
BCNF Decomposition Example (1) • 36
BCNF Decomposition Example (2) • 37
Revisiting BCNF Decomposition Algorithm • 38
Uniqueness of BCNF Decomposition • 39
Checking BCNF Condition • 40
Good Table Design in Practice • Normalization splits tables to reduce redundancy. • However, splitting tables has negative performance implication • Example: Instructor: name, office, phone, fax name → office, office → (phone, fax) • (design 1) Instructor(name, office, phone, fax) • (design 2) Instructor(name, office), Office(offce, phone, fax) • Q: Retrieve (name, office, phone) from Instructor. Which design is better? • As a rule of thumb, start with normalized tables and merge them if performance is not good enough 41
What We Learned • Relational design theory • Functional dependency • Trivial functional dependency • Logical implication • Closure • Decomposition • Lossless-join decomposition • Boyce-Codd Normal Form (BCNF) • BCNF decomposition algorithm • There exist other definitions of “Normal forms” • Third normal form, Fourth normal form, … • BCNF is most useful and widely used 42
- Hector garcia-molina junghoo cho
- Hector garcia-molina junghoo cho
- John cho ucla
- John cho ucla
- Junghoo cho
- Cs246
- Relational database design normalization
- Cho cho
- Cho cho
- Cho cho
- Cho cho
- Cho cho chon chonp
- Organic compound made by living things
- Relational algebra and calculus
- Relational algebra and relational calculus
- Relational calculus symbols
- Object relational and extended relational databases
- Relational query language
- Promotion from assistant to associate professor
- John cho ucla
- John cho ucla
- John cho ucla
- Professor john forsythe
- Professor john forsythe
- Professor john wood
- What is the self directed search
- Professor john hattie
- Professor john hughes
- John stanley temple
- 143 000 in scientific notation
- 11-1 areas of parallelograms and triangles answer key
- Hmn143
- Cse 143 anagram solver
- Air canada 143
- Ccc 143
- Divine intervention biostats
- Salmo bendito seja o senhor meu rochedo
- Agnieszka trepkowska
- Dean deluzio
- Salmo 143:2
- Hammurabi's code time period
- Hammurabis code 143
- 143 ent