Data Normalization Normal is not something to aspire
- Slides: 26
Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~
Activity: Sample User Report Can you build the underlying data model from this? How many tables? What are the relationships?
Understanding Functional Dependence l For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. ¡Written: A B ¡A determines B ¡B is the determinant ¡Ex: SSN Name (Name is functionally dependent on SSN)
Normalization Lingo l Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK l Non-Prime Attribute = Any attribute which is not part of the PK. l Key Attribute = Prime Attribute l Non-Key Attribute = Non-Prime Attribute
Normalization and FD l Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key. l There are three “levels” of analysis: 1. Functional Dependence – any non-prime attributes which as FD on the PK. 2. Partial Functional Dependence – any non-key attributes which are FD on part of the PK. 3. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).
Activity: IYCDTYCN! Identify the: l Functional Dependencies (WRT the PK) l Primary Key? l Partial Functional Dependencies (WRT part of the PK) l Prime Attributes? l Non-Prime Attributes l Transitive Functional Dependencies (WRT some non-prime attribute)
The Dependency Diagram l The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes.
Normal Forms l A Normal Form represents the current “state” of the data model. l There are 4 basic normal forms: ¡ Zero Normal Form (0 NF) l Non-key attributes exist which are not FD on PK. ¡ First Normal Form (1 NF) l All non-key attributes FD on entire PK. ¡ Second Normal Form (2 NF) l In 1 NF and l No partial functional dependencies exist. ¡ Third Normal Form (3 NF) l In 2 NF and l No transitive functional dependencies exist.
First Normal Form (1 NF) l Definition: ¡ All non-key attributes must be FD on the entire PK. (There must be PKFD for all attributes. ) l Rule: ¡ Move each non-key FD column into its own new table. l How to Apply the Rule: For each non-key FD column: 1. Place non-FD column into a new table 2. Copy the PK (or part of it) from the original table into the new table. This will be a FK in the new table. 3. Assign a PK to the new table (typically a composite key of the original Non-FD column and the FK. )
1 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Is there an attribute not FD on the PK? ¡ Is it in 1 NF already? ¡ What if Erin takes up bass fishing? ¡ I’m planning a ski trip, whom should I contact? l (How do I know Hobby 3, skiing and not Hobby 1)?
1 NF: Example 2/2 l What was done: ¡ Hobbies table created. Contains the originally non FD column, “hobby” ¡ The PK (FID) was copied into the hobbies table. ¡ The PK of the Hobbies table is the combination of FID and Hobby. l Questions: ¡ Is this in 1 NF? ¡ Can you reproduce the previous data model from this one? l Who likes skiing? Basketball?
Second Normal Form (2 NF) l Definition: ¡ The data model must be in 1 NF AND ¡ No partial functional dependencies can exist. l Rule: ¡ Move each partially FD non-key column into its own new table. l How to Apply the Rule: For each partial dependency: 1. Move all partially FD columns into a new table 2. Copy the determinant into the new table. 3. Make the determinant of the partial dependency: 1. The PK for the new table, FK to the existing table.
2 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Do any partial dependencies exist? l Where? l What is the determinant for each, if any? ¡ Is it in 1 NF already? 2 NF? ¡ I made a mistake, 81 HLV 3 is a Power edge 5500, not a 4400?
2 NF: Example 2/2 l What was done: ¡ Serial Num + SWID is the primary key. ¡ Servers, Software tables created from partial dependencies, where Serial Num, SWID are the determinants. ¡ Serial Num, is the PK for Servers, SWID is the PK for Software, each are also FK’s for the SWInstallation table l Questions: ¡ Is this in 2 NF? ¡ Can you reproduce the previous data model from this one?
Third Normal Form (3 NF) l Definition: ¡ The data model must be in 2 NF AND ¡ No transitive functional dependencies can exist. l Rule: ¡ Move each transitive FD non-key column into its own new table. l How to Apply the Rule: For each transitive dependency: 1. Move all transitive FD columns into a new table. 2. Copy the determinant column into the new table. 3. Make the determinant of the transitive dependency: n The the PK for the new table. n The FK for the original table.
3 NF: Example 1/2 l What’s wrong with this data model? ¡ What should be PK be? Why? ¡ Do any transitive dependencies exist? l Where? l What is the determinant for each, if any? ¡ Is it in 1 NF already? 2 NF? 3 NF? ¡ I made a mistake, Koors phone number is 4905? l What’s wrong?
3 NF: Example 2/2 l What was done: ¡ Beer ID is the PK. ¡ All transitive dependencies moved into a new table, Distributors. ¡ Distrib ID is the determinant. PK of Distributors table, FK in original Beer table. l Questions: ¡ Is this in 3 NF? ¡ Can you reproduce the previous data model from this one?
Higher Normal Forms Yes, there IS more… … and it will blow your mind.
Boyce-Codd Normal Form (BCNF) l Rule: Eliminate key-transitive dependencies l A table in BCNF Means: ¡The table is in 3 NF ¡It includes no Non-Key attribute which determines a key attribute, or part of a key attribute.
BCNF: An Example
Fourth Normal Form (4 NF) l RULE: Eliminate multiple sets of multivalued dependencies. l A table in 4 NF Means: ¡The table is in 3 NF ¡It includes no sets of attributes which contain multi-valued dependencies.
4 NF: An Example Figure 4. 15 Set of Tables in 4 NF Figure 4. 14 Multivalued Dependencies
How “far” should one Normalize? l For relational databases: ¡ 1 NF is required, at minimum for practical RDBMS implementations. ¡ The majority of the time data models are normalized to 3 NF. ¡ Sometimes certain tables are left in 1 NF or 2 NF, for performance or practical reasons. ¡ Higher normal forms BCNF, 4 NF are rare. l In General, the Higher the NF of your DM: ¡ The more complicated the internal DM ¡ The more “programming” required to reproduce the external DM. ¡ But, the lesser the chance for data anomalies!! l It’s a total trade-off: ¡ Database complexity vs. data anomalies.
Mike’s “Road To 3 NF” To normalize correctly, follow this process for each table in the data model: Designate a candidate key Any partial dependencies? PKFD for all attributes? n Apply 1 NF Rule y 1 NF y Apply 2 NF Rule n Party Hard ! 2 NF Any transitive dependencies? y Apply 3 NF Rule n 3 NF
Normalization Summary Cheat Sheet l 0 NF O l 1 NF O l 2 NF O 1 NF O (Resolve non FD) N 2 NF (Resolve Partial FD) N 1 O N 2 3 NF (Resolve Transitive FD) O N
Data Normalization Questions?
- Smart is something you become not something you are
- Aspire not to
- A part of something abstract
- Explanation purpose
- Literal vs figurative language worksheet
- What does concrete symbolize
- Symbol/symbolism definition
- Something old something new poem
- Something that describes something
- Data transformation by normalization
- Data transformation by normalization
- 3 weeks from today
- Microarray
- Sadlier vocabulary workshop level d unit 1 synonyms
- Iechs aspire
- Aspire lunch menu
- The aspire test helps to measure and career readiness
- What is a good act aspire score
- Www.actaspire.org
- Act aspire student sandbox
- Aspire together
- Aspire certification
- As a student leader i aspire
- Noun form of aspire
- Aspire as
- Aspire antonym
- What is basic human aspiration