Data Normalization Normal is not something to aspire

  • Slides: 26
Download presentation
Data Normalization Normal is not something to aspire to, it's something to get away

Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~

Activity: Sample User Report Can you build the underlying data model from this? How

Activity: Sample User Report Can you build the underlying data model from this? How many tables? What are the relationships?

Understanding Functional Dependence l For attributes A and B, B is functionally dependent on

Understanding Functional Dependence l For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. ¡Written: A B ¡A determines B ¡B is the determinant ¡Ex: SSN Name (Name is functionally dependent on SSN)

Normalization Lingo l Prime attribute = Any attribute which is a primary key, or

Normalization Lingo l Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK l Non-Prime Attribute = Any attribute which is not part of the PK. l Key Attribute = Prime Attribute l Non-Key Attribute = Non-Prime Attribute

Normalization and FD l Technically, normalization is just the analysis of Functional Dependencies of

Normalization and FD l Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key. l There are three “levels” of analysis: 1. Functional Dependence – any non-prime attributes which as FD on the PK. 2. Partial Functional Dependence – any non-key attributes which are FD on part of the PK. 3. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).

Activity: IYCDTYCN! Identify the: l Functional Dependencies (WRT the PK) l Primary Key? l

Activity: IYCDTYCN! Identify the: l Functional Dependencies (WRT the PK) l Primary Key? l Partial Functional Dependencies (WRT part of the PK) l Prime Attributes? l Non-Prime Attributes l Transitive Functional Dependencies (WRT some non-prime attribute)

The Dependency Diagram l The Dependency Diagram is a Very Useful Tool. It depicts

The Dependency Diagram l The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes.

Normal Forms l A Normal Form represents the current “state” of the data model.

Normal Forms l A Normal Form represents the current “state” of the data model. l There are 4 basic normal forms: ¡ Zero Normal Form (0 NF) l Non-key attributes exist which are not FD on PK. ¡ First Normal Form (1 NF) l All non-key attributes FD on entire PK. ¡ Second Normal Form (2 NF) l In 1 NF and l No partial functional dependencies exist. ¡ Third Normal Form (3 NF) l In 2 NF and l No transitive functional dependencies exist.

First Normal Form (1 NF) l Definition: ¡ All non-key attributes must be FD

First Normal Form (1 NF) l Definition: ¡ All non-key attributes must be FD on the entire PK. (There must be PKFD for all attributes. ) l Rule: ¡ Move each non-key FD column into its own new table. l How to Apply the Rule: For each non-key FD column: 1. Place non-FD column into a new table 2. Copy the PK (or part of it) from the original table into the new table. This will be a FK in the new table. 3. Assign a PK to the new table (typically a composite key of the original Non-FD column and the FK. )

1 NF: Example 1/2 l What’s wrong with this data model? ¡ What should

1 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Is there an attribute not FD on the PK? ¡ Is it in 1 NF already? ¡ What if Erin takes up bass fishing? ¡ I’m planning a ski trip, whom should I contact? l (How do I know Hobby 3, skiing and not Hobby 1)?

1 NF: Example 2/2 l What was done: ¡ Hobbies table created. Contains the

1 NF: Example 2/2 l What was done: ¡ Hobbies table created. Contains the originally non FD column, “hobby” ¡ The PK (FID) was copied into the hobbies table. ¡ The PK of the Hobbies table is the combination of FID and Hobby. l Questions: ¡ Is this in 1 NF? ¡ Can you reproduce the previous data model from this one? l Who likes skiing? Basketball?

Second Normal Form (2 NF) l Definition: ¡ The data model must be in

Second Normal Form (2 NF) l Definition: ¡ The data model must be in 1 NF AND ¡ No partial functional dependencies can exist. l Rule: ¡ Move each partially FD non-key column into its own new table. l How to Apply the Rule: For each partial dependency: 1. Move all partially FD columns into a new table 2. Copy the determinant into the new table. 3. Make the determinant of the partial dependency: 1. The PK for the new table, FK to the existing table.

2 NF: Example 1/2 l What’s wrong with this data model? ¡ What should

2 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Do any partial dependencies exist? l Where? l What is the determinant for each, if any? ¡ Is it in 1 NF already? 2 NF? ¡ I made a mistake, 81 HLV 3 is a Power edge 5500, not a 4400?

2 NF: Example 2/2 l What was done: ¡ Serial Num + SWID is

2 NF: Example 2/2 l What was done: ¡ Serial Num + SWID is the primary key. ¡ Servers, Software tables created from partial dependencies, where Serial Num, SWID are the determinants. ¡ Serial Num, is the PK for Servers, SWID is the PK for Software, each are also FK’s for the SWInstallation table l Questions: ¡ Is this in 2 NF? ¡ Can you reproduce the previous data model from this one?

Third Normal Form (3 NF) l Definition: ¡ The data model must be in

Third Normal Form (3 NF) l Definition: ¡ The data model must be in 2 NF AND ¡ No transitive functional dependencies can exist. l Rule: ¡ Move each transitive FD non-key column into its own new table. l How to Apply the Rule: For each transitive dependency: 1. Move all transitive FD columns into a new table. 2. Copy the determinant column into the new table. 3. Make the determinant of the transitive dependency: n The the PK for the new table. n The FK for the original table.

3 NF: Example 1/2 l What’s wrong with this data model? ¡ What should

3 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Do any transitive dependencies exist? l Where? l What is the determinant for each, if any? ¡ Is it in 1 NF already? 2 NF? 3 NF? ¡ I made a mistake, Koors phone number is 4905? l What’s wrong?

3 NF: Example 2/2 l What was done: ¡ Beer ID is the PK.

3 NF: Example 2/2 l What was done: ¡ Beer ID is the PK. ¡ All transitive dependencies moved into a new table, Distributors. ¡ Distrib ID is the determinant. PK of Distributors table, FK in original Beer table. l Questions: ¡ Is this in 3 NF? ¡ Can you reproduce the previous data model from this one?

Higher Normal Forms Yes, there IS more… … and it will blow your mind.

Higher Normal Forms Yes, there IS more… … and it will blow your mind.

Boyce-Codd Normal Form (BCNF) l Rule: Eliminate key-transitive dependencies l A table in BCNF

Boyce-Codd Normal Form (BCNF) l Rule: Eliminate key-transitive dependencies l A table in BCNF Means: ¡The table is in 3 NF ¡It includes no Non-Key attribute which determines a key attribute, or part of a key attribute.

BCNF: An Example

BCNF: An Example

Fourth Normal Form (4 NF) l RULE: Eliminate multiple sets of multivalued dependencies. l

Fourth Normal Form (4 NF) l RULE: Eliminate multiple sets of multivalued dependencies. l A table in 4 NF Means: ¡The table is in 3 NF ¡It includes no sets of attributes which contain multi-valued dependencies.

4 NF: An Example Figure 4. 15 Set of Tables in 4 NF Figure

4 NF: An Example Figure 4. 15 Set of Tables in 4 NF Figure 4. 14 Multivalued Dependencies

How “far” should one Normalize? l For relational databases: ¡ 1 NF is required,

How “far” should one Normalize? l For relational databases: ¡ 1 NF is required, at minimum for practical RDBMS implementations. ¡ The majority of the time data models are normalized to 3 NF. ¡ Sometimes certain tables are left in 1 NF or 2 NF, for performance or practical reasons. ¡ Higher normal forms BCNF, 4 NF are rare. l In General, the Higher the NF of your DM: ¡ The more complicated the internal DM ¡ The more “programming” required to reproduce the external DM. ¡ But, the lesser the chance for data anomalies!! l It’s a total trade-off: ¡ Database complexity vs. data anomalies.

Mike’s “Road To 3 NF” To normalize correctly, follow this process for each table

Mike’s “Road To 3 NF” To normalize correctly, follow this process for each table in the data model: Designate a candidate key Any partial dependencies? PKFD for all attributes? n Apply 1 NF Rule y 1 NF y Apply 2 NF Rule n Party Hard ! 2 NF Any transitive dependencies? y Apply 3 NF Rule n 3 NF

Normalization Summary Cheat Sheet l 0 NF O l 1 NF O l 2

Normalization Summary Cheat Sheet l 0 NF O l 1 NF O l 2 NF O 1 NF O (Resolve non FD) N 2 NF (Resolve Partial FD) N 1 O N 2 3 NF (Resolve Transitive FD) O N

Data Normalization Questions?

Data Normalization Questions?