More on normalization CSC 240 Blum 1 Normalization
More on normalization CSC 240 (Blum) 1
Normalization Review ¡ ¡ CSC 240 (Blum) In the normalization approach, one has data in a table-like format and begins to rearrange the data into smaller tables that minimize the data redundancy without losing any of the information in the relationships. This decomposition can be done in steps, each step called a Normal Form having more stringent conditions than the previous. 2
Normalization Starting Point: Un-normalized table ¡ The first step in normalization is to eliminate any multi-valued fields by flattening the table. l ¡ CSC 240 (Blum) A single row of the table with multivalued fields is replaced with many rows, one for each value in the multivalued field. When multi-valued fields are thus eliminated, the table is said to be in the First Normal Form. 3
Un-normalized table example PA Pennsylvania Harrisburgh Borski Brady Coyne … NJ New Jersey Trenton Andrews Robert E. Ferguso Mike n … … CSC 240 (Blum) Robert A. William J. … 4
Flattened table (now in the First Normal Form) PA Pennsylvania Harrisburgh Borski Robert A. PA Pennsylvania Harrisburgh Brady Robert A. PA Pennsylvania Harrisburgh Coyne William J. … … … NJ New Jersey Trenton Andrews Robert E. NJ New Jersey Trenton Ferguson Mike … … CSC 240 (Blum) … 5
A note on primary keys of flattened tables Recall that a primary key must uniquely identify each record in a table. ¡ When a table is flattened to eliminate multi-valued fields, one of those previously multi-valued fields must be the primary key or part of a composite primary key. ¡ CSC 240 (Blum) 6
Example Maj Student. ID Last. Name First. Name Middle Maj 1 9874923 Mc. Dermott Mary Margaret ENG 9840495 Jameson John James CSC 2 Dept INF Crse. No Course. Name Semester Grade ENG 108 Writing II Fall 2002 A- PHL 151 Critical Thinking Fall 2002 B+ MTH 150 Thinking Math Fall 2002 B ENG 230 American Lit Spring 2003 B+ REL 150 Rel in America Spring 2003 B- CSC T MTH 230 Programming Fall 2002 A- 160 Discrete Math Fall 2002 B- ENG 107 Writing I Fall 2002 B+ PHL 151 Critical Thinking Spring 2003 B REL 150 Rel in America Spring 2003 B+ When the table is flattened, something about the course must be part of the primary key – in this case we need Dept, Crse. No and Semester. CSC 240 (Blum) 7
The Gain: searchability ¡ Flattening the file will help with searching and querying. l l CSC 240 (Blum) We do not have to look for “Coyne” buried in the midst of a long list that must be parsed (broken into pieces), etc. If you can imagine that you might want to search on it, it should be “atomic” in a field of a record by itself. 8
The loss: data redundancy ¡ ¡ Data redundancy is the unnecessary repetition of information. Data redundancy makes it hard to maintain data integrity (correctness). l ¡ CSC 240 (Blum) E. g. when you change your address with one department in an organization, but the other departments still have your old address. Some repetition is necessary to maintain relationships. 9
Flattened table (now in the First Normal Form) PA Pennsylvania Harrisburgh Borski Robert A. PA Pennsylvania Harrisburgh Brady Robert A. PA Pennsylvania Harrisburgh Coyne William J. … … … • Repetition of PA is necessary to maintain the relationship between a House. Member and a State. • Repetition of Pennsylvania and Harrisburgh is unnecessary. CSC 240 (Blum) 10
Identifying and reducing redundancy ¡ ¡ ¡ CSC 240 (Blum) The reason Pennsylvania and Harrisburgh are unnecessarily repeated is because the relationship is fully realized by just using the state. Symbol. The state. Name and state. Capital are uniquely determined by the state. Symbol. state. Name and state. Capital are said to be functionally dependent on state. Symbol. 11
Uniquely!! ¡ ¡ CSC 240 (Blum) In Friday’s Billboard example, the song did not determine the artist uniquely. Usher and Alicia Keys both sing “My Boo, ” thus “My Boo” cannot be used to determine the artist uniquely. 12
Reducing Redundancy ¡ ¡ ¡ CSC 240 (Blum) Identifying functional dependencies is the key to reducing redundancy. There a few normal forms (Second Normal Form, Third Normal Form and Boyce-Codd Normal Form) which eliminate increasing degrees of redundancy. What distinguishes the various forms is the field(s) upon which something is functionally dependent and the type of functional dependence. 13
Determinant ¡ ¡ To avoid awkward phrases like “the field(s) upon which something is functionally dependent”, we introduce the term determinant. The determinant attribute determines some other attribute, i. e. this other attribute is functionally dependent upon the determinant field. l l CSC 240 (Blum) state. Name depends on state. Symbol is the determinant of state. Name 14
Types of functional dependence ¡ ¡ Attribute B is functionally dependent on Attribute A if knowing the value of A means one can in turn know the value of B uniquely. Attribute A (the determinant) may be a composite attribute – i. e. , made up of more than one field. If the full knowledge of A (all of its composite fields) is necessary to determine B, then B is fully dependent on A. If only partial knowledge of A (some of its composite fields) is necessary to determine B, then B is partially dependent on A. CSC 240 (Blum) 15
Not a two-way street ¡ Recall that B being functionally dependent on A does not mean A is functionally dependent on B. l CSC 240 (Blum) date. Of. Birth is functionally dependent on soc. Sec. Num, but soc. Sec. Num is not functionally dependent on date. Of. Birth 16
Transitive dependence ¡ If attribute B is functionally dependent on A and attribute C is functionally dependent on B, then C is said to transitively dependent on A provided that A is not functionally dependent on C. l CSC 240 (Blum) It seemed that the combination of catalog. No and Title determined Imprint and that Imprint determined Label. 17
Another Transitive Dependency Example ¡ ¡ CSC 240 (Blum) employee. Number is functionally dependent on soc. Sec. Num and salary is functionally dependent on employee. Number, then salary is “transitively functionally dependent” on soc. Sec. Num. Rephrased: soc. Sec. Num is a determinant of employee. Number and employee. Number is a determinant of salary, then soc. Sec. Num is a determinant of salary. 18
Yet Another Transitive Dependency Example ¡ CSC 240 (Blum) stock. ID is functionally dependent on transaction. ID and stock. Name is functionally dependent on stock. ID, then stock. Name is transitively dependent on transaction. ID. 19
Primary key ¡ Recall the primary key is an attribute or set of attributes that uniquely identify each row in a table. l l ¡ Thus every attribute that is not part of the primary key is functionally dependent on the primary key. Rephrased: The primary key is a determinant of any non-primary-key attribute. The level of decomposition (the Normal Form) is determined by the type of dependence the field has on the primary key. CSC 240 (Blum) 20
Second Normal Form ¡ Eliminating any field (via table decomposition) that partially depends on the primary key puts the table into Second Normal Form (provided the table was in the First Normal Form prior to decomposition). l Note that a table with a simple (noncomposite) primary key is necessarily in Second Normal Form. CSC 240 (Blum) 21
Second Normal Form Example ¡ Character. Featured. In. Episode(character. ID, episode. ID, first. Name, last. Name, character. Description, title, episode. Description, original. Air. Date) l ¡ In the notation above the underlined fields are serving as the primary key. The primary key is composite. l l The attributes first. Name, last. Name and character. Description are partially functionally dependent on the primary key because they are determined only by character. ID. The attributes title episode. Description and original. Air. Date are partially functionally dependent because they are determined only by episode. ID. CSC 240 (Blum) 22
Second Normal Form Example (Cont. ) Create tables having primary keys which are subsets (including a proper subset but not the empty set) of the primary keys of the original table. ¡ Place the non-primary-key attributes in the table in which they are fully dependent. ¡ CSC 240 (Blum) 23
Second Normal Form Example (Cont. ) Character(character. ID, last. Name, first. Name, character. Description) ¡ Episode(episode. ID, title, episode. Description, original. Air. Date) ¡ Episode. Feature(character. ID, episode. ID) ¡ l CSC 240 (Blum) Note that although no field depends on both character. ID and episode. ID, we must keep the table with both keys to maintain the relationship (the lossless join property). 24
Some Redundancy May Remain ¡ PA Pennsylvania Harrisburgh Borski Robert A. PA Pennsylvania Harrisburgh Brady Robert A. PA Pennsylvania Harrisburgh Coyne William J. … … … In the above example, the first. Name, last. Name combination may serve as the primary key, each of the other attributes state. Symbol, state. Name and state. Capital are fully dependent on the primary key. So it’s in Second Normal Form. But clearly there is still redundancy. CSC 240 (Blum) 25
Third Normal Form ¡ CSC 240 (Blum) Eliminating any field (via table decomposition) that transitively depends on the primary key puts the table into Third Normal Form (provided the table was in the Second Normal Form prior to decomposition). 26
¡ PA Pennsylvania Harrisburgh Borski Robert A. PA Pennsylvania Harrisburgh Brady Robert A. PA Pennsylvania Harrisburgh Coyne William J. … … … first. Name, last. Name determines state. Symbol which in turn determines state. Name and state. Capital. (transitive dependence) CSC 240 (Blum) 27
Decomposition into the Third Normal Form Create another table that has as a primary key the attribute which is the intermediate attribute in the transitive dependence. ¡ (last. Name, first. Name, state. Symbol) ¡ (state. Symbol, state. Name, state. Capital) ¡ CSC 240 (Blum) 28
Another transitive dependence example ¡ ¡ CSC 240 (Blum) Customer(customer. ID, last. Name, first. Name, street, city, state, zipcode, state. Tax, city. Tax) There is a simple primary key, so the table is in Second Normal Form (2 NF). But the city tax is dependent on the city and the state tax is dependent on the state. In fact city and state are dependent on zipcode. 29
Another transitive dependence example ¡ ¡ ¡ Customer(customer. ID, last. Name, first. Name, street, zipcode) Address. Info(zipcode, city, state, city. Tax, state. Tax) There could be further decomposition since state. Tax depends on state. l CSC 240 (Blum) For practical purposes, many draw the line at some of these decomposition even if they do reduce data redundancy. A question to ask is how likely is an update anomaly for the particular set of data. 30
Introducing primary keys ¡ If one has an awkward primary key l l ¡ ¡ CSC 240 (Blum) Perhaps it is composite, e. g. First. Name, Last. Name Perhaps it may change, Item. Name Then it is valid to introduce an ID to serve as a primary key. Just don’t let the introduction of simple key get in the way of eliminating data redundancy. This can be a problem with second normal form which is defined as having no partial dependence on the primary key. Thus the 2 NF decomposition can depend on one’s choice of primary key. 31
Example of key introduction ¡ Suppose in the library example l ¡ CSC 240 (Blum) Library(date. Borrowed, date. Due, date. Returned, borrower. ID, borrower. FName, borrower. LName, borrower. Status, book. ID, book. Title, (author. ID, author. FName, author. LName), book. Copy) One introduces a transaction. ID and makes it the primary key above, then we would be done as far as Second Normal Form goes. 32
Primary Candidate Key The definition of the Second Normal Form can be extended by replacing “partial dependence on the primary key” with “partial dependence on any candidate key. ” ¡ A candidate key is a field or set of fields that could be used to uniquely distinguish each record. ¡ CSC 240 (Blum) 33
Second Normal Form Extended ¡ CSC 240 (Blum) Even if we introduce a transaction. ID into the library example, if we use the extended version of the Second Normal Form, we would have some decomposition and the associated reduction in data redundancy. 34
Boyce-Codd Normal Form After Third Normal form, the next stricter form is called the Boyce. Codd Normal Form. ¡ A table is in Boyce-Codd Normal Form if the only determinants (attributes that determine other attributes) are candidate keys. ¡ CSC 240 (Blum) 35
References Database Systems, Rob and Coronel ¡ Database Systems, Connolly and Begg ¡ Fundamentals of Relational Databases, Mata-Toledo and Cushman ¡ Concepts of Database Management, Pratt and Adamski ¡ CSC 240 (Blum) 36
- Slides: 36