Carnegie Mellon Univ Dept of Computer Science 15

  • Slides: 53
Download presentation
Carnegie Mellon Univ. Dept. of Computer Science 15 -415 - Database Applications C. Faloutsos

Carnegie Mellon Univ. Dept. of Computer Science 15 -415 - Database Applications C. Faloutsos Database design and normalization Carnegie Mellon 15 -415 - C. Faloutsos

Overview • Relational model – formal query languages – commercial query languages (SQL) •

Overview • Relational model – formal query languages – commercial query languages (SQL) • Integrity constraints – domain I. C. , foreign keys – functional dependencies • DB design and normalization Carnegie Mellon 15 -415 - C. Faloutsos 2

Overview - detailed • DB design and normalization – pitfalls of bad design –

Overview - detailed • DB design and normalization – pitfalls of bad design – decomposition – normal forms Carnegie Mellon 15 -415 - C. Faloutsos 3

Goal • Design ‘good’ tables – sub-goal#1: define what ‘good’ means – sub-goal#2: fix

Goal • Design ‘good’ tables – sub-goal#1: define what ‘good’ means – sub-goal#2: fix ‘bad’ tables • in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” • Let’s see why, and how: Carnegie Mellon 15 -415 - C. Faloutsos 4

Pitfalls takes 1 (ssn, c-id, grade, name, address) Carnegie Mellon 15 -415 - C.

Pitfalls takes 1 (ssn, c-id, grade, name, address) Carnegie Mellon 15 -415 - C. Faloutsos 5

Pitfalls ‘Bad’ - why? because: ssn->address, name Carnegie Mellon 15 -415 - C. Faloutsos

Pitfalls ‘Bad’ - why? because: ssn->address, name Carnegie Mellon 15 -415 - C. Faloutsos 6

Pitfalls • Redundancy – space – (inconsistencies) – insertion/deletion anomalies: Carnegie Mellon 15 -415

Pitfalls • Redundancy – space – (inconsistencies) – insertion/deletion anomalies: Carnegie Mellon 15 -415 - C. Faloutsos 7

Pitfalls • insertion anomaly: – “jones” registers, but takes no class - no place

Pitfalls • insertion anomaly: – “jones” registers, but takes no class - no place to store his address! Carnegie Mellon 15 -415 - C. Faloutsos 8

Pitfalls • deletion anomaly: – delete the last record of ‘smith’ (we lose his

Pitfalls • deletion anomaly: – delete the last record of ‘smith’ (we lose his address!) Carnegie Mellon 15 -415 - C. Faloutsos 9

Solution: decomposition • split offending table in two (or more), eg. : ? Carnegie

Solution: decomposition • split offending table in two (or more), eg. : ? Carnegie Mellon ? 15 -415 - C. Faloutsos 10

Overview - detailed • DB design and normalization – pitfalls of bad design –

Overview - detailed • DB design and normalization – pitfalls of bad design – decomposition • lossless join decomp. • dependency preserving – normal forms Carnegie Mellon 15 -415 - C. Faloutsos 11

Decompositions • there are ‘bad’ decompositions: we want • lossless and • dependency preserving

Decompositions • there are ‘bad’ decompositions: we want • lossless and • dependency preserving Carnegie Mellon 15 -415 - C. Faloutsos 12

Decompositions - lossy: R 1(ssn, grade, name, address) R 2(c-id, grade) ssn->name, address ssn,

Decompositions - lossy: R 1(ssn, grade, name, address) R 2(c-id, grade) ssn->name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 13

Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn,

Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 14

Decompositions example of non-dependency preserving S# -> address, status S# -> address S# ->

Decompositions example of non-dependency preserving S# -> address, status S# -> address S# -> status address -> status Carnegie Mellon 15 -415 - C. Faloutsos 15

Decompositions (drill: is it lossless? ) S# -> address, status S# -> address S#

Decompositions (drill: is it lossless? ) S# -> address, status S# -> address S# -> status address -> status Carnegie Mellon 15 -415 - C. Faloutsos 16

Decompositions - lossless Definition: consider schema R, with FD ‘F’. R 1, R 2

Decompositions - lossless Definition: consider schema R, with FD ‘F’. R 1, R 2 is a lossless join decomposition of R if we always have: An easier criterion? Carnegie Mellon 15 -415 - C. Faloutsos 17

Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey

Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally: Carnegie Mellon 15 -415 - C. Faloutsos 18

Decomposition - lossless example: R 2 R 1 ssn, c-id -> grade ssn->name, address

Decomposition - lossless example: R 2 R 1 ssn, c-id -> grade ssn->name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 19

Overview - detailed • DB design and normalization – pitfalls of bad design –

Overview - detailed • DB design and normalization – pitfalls of bad design – decomposition • lossless join decomp. • dependency preserving – normal forms Carnegie Mellon 15 -415 - C. Faloutsos 20

Decomposition - depend. pres. informally: we don’t want the original FDs to span two

Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address, status S# -> address S# -> status address -> status Carnegie Mellon 15 -415 - C. Faloutsos 21

Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status

Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status Carnegie Mellon S# -> address -> status (but: S#->status ? ) 15 -415 - C. Faloutsos 22

Decomposition - depend. pres. informally: we don’t want the original FDs to span two

Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables. More specifically: … the FDs of the canonical cover. Carnegie Mellon 15 -415 - C. Faloutsos 23

Decomposition - depend. pres. why is dependency preservation good? S# -> address S# ->

Decomposition - depend. pres. why is dependency preservation good? S# -> address S# -> status S# -> address -> status (address->status: ‘lost’) Carnegie Mellon 15 -415 - C. Faloutsos 24

Decomposition - depend. pres. A: eg. , record that ‘Philly’ has status ‘A’ S#

Decomposition - depend. pres. A: eg. , record that ‘Philly’ has status ‘A’ S# -> address S# -> status S# -> address -> status (address->status: ‘lost’) Carnegie Mellon 15 -415 - C. Faloutsos 25

Decomposition - conclusions • decompositions should always be lossless – joining attribute -> superkey

Decomposition - conclusions • decompositions should always be lossless – joining attribute -> superkey • whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…) Carnegie Mellon 15 -415 - C. Faloutsos 26

Overview - detailed • DB design and normalization – pitfalls of bad design –

Overview - detailed • DB design and normalization – pitfalls of bad design – decomposition (-> how to fix the problem) – normal forms (-> how to detect the problem) • BCNF, • 3 NF • (1 NF, 2 NF) Carnegie Mellon 15 -415 - C. Faloutsos 27

Normal forms - BCNF We saw how to fix ‘bad’ schemas but what is

Normal forms - BCNF We saw how to fix ‘bad’ schemas but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, ie. , a set of rules. Typically: Boyce-Codd Normal form Carnegie Mellon 15 -415 - C. Faloutsos 28

Normal forms - BCNF Defn. : Rel. R is in BCNF wrt F, if

Normal forms - BCNF Defn. : Rel. R is in BCNF wrt F, if • informally: everything depends the full key, and nothing but the key • semi-formally: every determinant (of the cover) is a candidate key Carnegie Mellon 15 -415 - C. Faloutsos 29

Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade Carnegie

Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 30

Normal forms - BCNF Formally: for every FD a->b in F+ – a->b is

Normal forms - BCNF Formally: for every FD a->b in F+ – a->b is trivial (a superset of b) or – a is a superkey (or both) Carnegie Mellon 15 -415 - C. Faloutsos 31

Normal forms - BCNF Theorem: given a schema R and a set of FD

Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R 1, … Rn, so that – R 1, … Rn are in BCNF and – the decompositions are lossless. (but, some decomp. might lose dependencies) Carnegie Mellon 15 -415 - C. Faloutsos 32

Normal forms - BCNF How? algorithm in book - essentially, break off FDs of

Normal forms - BCNF How? algorithm in book - essentially, break off FDs of the cover eg. TAKES 1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 33

Normal forms - BCNF eg. TAKES 1(ssn, c-id, grade, name, address) ssn -> name,

Normal forms - BCNF eg. TAKES 1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name ssn address grade c-id Carnegie Mellon 15 -415 - C. Faloutsos 34

Normal forms - BCNF ssn, c-id -> grade ssn->name, address ssn, c-id -> grade

Normal forms - BCNF ssn, c-id -> grade ssn->name, address ssn, c-id -> grade Carnegie Mellon 15 -415 - C. Faloutsos 35

Normal forms - BCNF pictorially: we want a ‘star’ shape name ssn address grade

Normal forms - BCNF pictorially: we want a ‘star’ shape name ssn address grade : not in BCNF c-id Carnegie Mellon 15 -415 - C. Faloutsos 36

Normal forms - BCNF pictorially: we want a ‘star’ shape F B A or

Normal forms - BCNF pictorially: we want a ‘star’ shape F B A or C D G E H Carnegie Mellon 15 -415 - C. Faloutsos 37

Normal forms - BCNF or a star-like: (eg. , 2 cand. keys): STUDENT(ssn, st#,

Normal forms - BCNF or a star-like: (eg. , 2 cand. keys): STUDENT(ssn, st#, name, address) name ssn address st# Carnegie Mellon address = st# 15 -415 - C. Faloutsos 38

Normal forms - BCNF but not: F B A or D G E C

Normal forms - BCNF but not: F B A or D G E C Carnegie Mellon D H 15 -415 - C. Faloutsos 39

Normal forms - 3 NF consider the ‘classic’ case: STJ( Student, Teacher, sub. Ject)

Normal forms - 3 NF consider the ‘classic’ case: STJ( Student, Teacher, sub. Ject) T-> J S, J -> T is it BCNF? S T J Carnegie Mellon 15 -415 - C. Faloutsos 40

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T How to decompose it to BCNF? S T J Carnegie Mellon 15 -415 - C. Faloutsos 41

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T 1) R 1(T, J) R 2(S, J) (BCNF? - lossless? - dep. pres. ? ) 2) R 1(T, J) R 2(S, T) (BCNF? Carnegie Mellon - lossless? 15 -415 - C. Faloutsos 42

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T 1) R 1(T, J) R 2(S, J) (BCNF? Y+Y - lossless? N - dep. pres. ? N ) 2) R 1(T, J) R 2(S, T) (BCNF? Y+Y - lossless? Y - dep. pres. ? N ) Carnegie Mellon 15 -415 - C. Faloutsos 43

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T in this case: impossible to have both • BCNF and • dependency preservation Welcome 3 NF! Carnegie Mellon 15 -415 - C. Faloutsos 44

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T S T J Carnegie Mellon informally, 3 NF ‘forgives’ the red arrow in the can. cover 15 -415 - C. Faloutsos 45

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J

Normal forms - 3 NF STJ( Student, Teacher, sub. Ject) T-> J S, J -> T • it is trivial or S T J Carnegie Mellon Formally, a rel. R with FDs ‘F’ is in 3 NF if: for every a->b in F+: • a is a superkey or • each b-a attr. : part of a cand. key 15 -415 - C. Faloutsos 46

Normal forms - 3 NF how to bring a schema to 3 NF? algo

Normal forms - 3 NF how to bring a schema to 3 NF? algo in book: for each FD in the cover, put it in a table Carnegie Mellon 15 -415 - C. Faloutsos 47

Normal forms - 3 NF vs BCNF • If ‘R’ is in BCNF, it

Normal forms - 3 NF vs BCNF • If ‘R’ is in BCNF, it is always in 3 NF (but not the reverse) • In practice, aim for – BCNF; lossless join; and dep. preservation • if impossible, we accept – 3 NF; but insist on lossless join and dep. preservation Carnegie Mellon 15 -415 - C. Faloutsos 48

Normal forms - more details • why ‘ 3’NF? what is 2 NF? 1

Normal forms - more details • why ‘ 3’NF? what is 2 NF? 1 NF? • 1 NF: attributes are atomic (ie. , no set-valued attr. , a. k. a. ‘repeating groups’) not 1 NF Carnegie Mellon 15 -415 - C. Faloutsos 49

Normal forms - more details 2 NF: 1 NF and non-key attr. fully depend

Normal forms - more details 2 NF: 1 NF and non-key attr. fully depend on the key counter-example: TAKES 1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade ssn grade name address c-id Carnegie Mellon 15 -415 - C. Faloutsos 50

Normal forms - more details • 3 NF: 2 NF and no transitive dependencies

Normal forms - more details • 3 NF: 2 NF and no transitive dependencies • counter-example: D A in 2 NF, but not in 3 NF B C Carnegie Mellon 15 -415 - C. Faloutsos 51

Normal forms - more details • 4 NF, multivalued dependencies etc: IGNORE • in

Normal forms - more details • 4 NF, multivalued dependencies etc: IGNORE • in practice, E-R diagrams usually lead to tables in BCNF Carnegie Mellon 15 -415 - C. Faloutsos 52

Overview - conclusions DB design and normalization – pitfalls of bad design – decompositions

Overview - conclusions DB design and normalization – pitfalls of bad design – decompositions (lossless, dep. preserving) – normal forms (BCNF or 3 NF) “everything should depend on the key, the whole key, and nothing but the key” Carnegie Mellon 15 -415 - C. Faloutsos 53