Third Normal Form And Database Design 1 Are
Third Normal Form And Database Design 1
Are you sick of this slide yet? • Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. • Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage. • Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. • When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
Three Anomalies • The three anomalies all are problems caused by trying to put data for more than one entity into a single table. • The anomalies are: – Insert – Update – Delete 3
Insert Anomaly • Above is a partial table of Major League Soccer players. • To save time, the database manager has put the team information into the players table. • When I try to add a Detroit expansion team, Access won’t let me because I don’t yet have any players associated with the team. • To add a record to this table, I must have a Player. ID to enter as the primary key. • This is the insert anomaly. 4
Insert Anomaly • Imagine you have a small store, and you are keeping all of your customer records in your orders table. • One day, Bill Gates (or some other billionaire) comes in and says “I like what you’ve got for sale here. I’d like to set up an account. ” • You reply: “What do want to buy? ” • He says: “Nothing yet. But I want to be able to get something quickly when I need it. ” • Are you going to tell him that you can’t enter him as a customer because your (Microsoft) database won’t allow it? • Putting customers and orders in different tables allows you to keep records of customers who don’t have any orders yet. • Not being able to store that information is the Insert Anomaly. 5
Update Anomaly • David Beckham’s wife decides to buy his team. • You duly record this in his player record. • But now you have Landon Donovan and David Beckham playing for the same team, but different owners! • Your data is now inconsistent. • This is the update anomaly. 6
Delete Anomaly • Cuatehemoc Blanco suffers a career-ending injury playing for Mexico in the World Cup. Que lastima! • You remove him from your MLS roster. • Oh-oh! You’ve removed all your information about the Chicago Fire team as well! • This is the delete anomaly. 7
One Entity Per Table • All three anomalies result from the same cause: Putting attributes of multiple entities into a single table. • Codd and others recognized this, and codified (Coddified? ) the principles of table and database design that prevent this from happening. • Implementing these principles is called “normalization, ” and the goal of normalization is, in most cases, Third Normal Form. • I will abbreviate Third Normal Form as “ 3 NF”. 8
Normalization • Most of the database books refer to normalization as a process—you start from some horribly messed-up database, put it through your normalizer, and out comes a proper database in 3 NF. • In general, I don’t think it works that way. • If you are designing a new database from scratch, you can stick to the one-entity-per-table rule (and a few other rules) and create a properly normalized database the first time. • I’ll outline this procedure later in the lecture. 9
Normal Forms • Nevertheless, you should have some familiarity with the first three normal forms, even if you don’t have to go through them step-by-step. • For online quiz 2 and the in-class quiz 3 next Tuesday, you will be expected to be able to identify whether a table or database is in 3 NF or not. • The distinctions of 1 NF and 2 NF will not be tested (since a table must be in both 1 NF and 2 NF to have a chance of being in 3 NF). 10
First Normal Form • First Normal Form is poorly defined in most books, and the books generally don’t agree with each other. • It is easiest to define negatively. • The next few slides will describe things that keep a table from being in 1 NF. 11
A table is NOT in 1 NF if it doesn’t have a primary key • A table without a primary key has no way to keep out duplicate entries. 12
A table is NOT in 1 NF if cells contain multiple data items • This is a fairly common beginner’s mistake. • Putting multiple data items into a cell, as is done in the Friends. IDs column here, is very bad database design. • It makes querying the table very difficult. 13
Putting Multiple Similar (Numbered) Fields in a Table Violates 1 NF • A very common mistake. • Wastes space with all of the empty cells. • And someone always comes along and signs up for six classes. • And it’s hard to search! (Answering “Which students are taking IOE 373? ” requires searching 5 columns. ) 14
Having Multiple Fields That Are Similar Violates 1 NF • This is a variation on the previous slide. • Probably a little better, but still wrong. • The course fields are all instances of the “course” entity. They don’t belong in a Students table. 15
So what’s wrong? • The last three slides are examples of trying to represent one-to-many or many-to-many relationships in a single table. • That violates 1 NF, not to mention 3 NF. • BTW, to mention 3 NF—a table must be in 1 NF and 2 NF before it has a possibility to be in 3 NF. 16
Second Normal Form • The definition for 2 NF is much clearer. • To be in 2 NF, a table must: – Be in 1 NF. – Have all non-key fields be attributes of the ENTIRE key. – 2 NF applies only to compound keys; tables with simple keys that are in 1 NF are automatically in 2 NF as well. 17
2 NF • 2 NF is only an issue with tables having compound primary keys. • 2 NF is especially important when you have a chain of tables linked by relationships between their primary keys. • The basic idea with 2 NF is to put an attribute into the proper table—the table in which it is a property of the entire key, no more, no less. • The following two examples will hopefully make this clear. 18
Example: University • In which table should each non-key field go?
University: Solution
Example: Invoices • Where should each non-key field go?
Invoices • Issues: – If you don’t have different product codes for items that differ only in color or size, those two attributes could be included in Order. Details instead. – Similarly, Cost depends on whether you have fixed prices (in which case cost belongs with Products), or if your prices change with time or if certain customers or orders get discounts.
2 NF Summary • Second Normal Form is only an issue with compound keys (more than one field in the primary key). • If a table is in 1 NF and has a simple key it is automatically in 2 NF. • All non-key fields in a table with a compound key should be properties of the entire key (in combination)—not simply attributes of a part of the key.
Third Normal Form • 3 NF is also fairly straightforward. A table in 3 NF must be: – In 2 NF; and – All attributes must be attributes of the key only; the table should include no attributes of attributes. – Attributes of attributes are technically referred to as “transitive dependencies”. Don’t put them in your tables! 24
The Golden Rule for 3 NF Every non-key field in a table should depend on the key, the whole key, and nothing but the key, so help me Codd. 25
Online Quiz 2 • It is easiest to learn 3 NF by example, and I’ve prepared plenty of examples for you. • There’s no outside reading for online quiz 2. • Review this slide show quickly before taking the quiz. • For each problem that is not in 3 NF, read the feedback which explains why. • Before long, you’ll be able to quickly recognize what is 3 NF, and what isn’t. • There a few gray areas, and these are discussed in the quiz as well. • I’ll now demonstrate the quiz and explain a few of the problems. 26
Last Time! • Good relational database design is about optimizing how the data is STORED, not how it is DISPLAYED. • Most “tables” you have seen—in books, in lectures, on the web—were probably optimized for display, not for storage. • Relational database tables are designed for consistency and to reduce redundancy. They are not designed for appearance. • When we learn SQL and Visual Basic, we will look at various ways to display the data stored in relational database tables.
Rarely Simple, Never Finished • A reminder of a couple of things I’ve said before: – Database design is as much art as science. – The only database designs that are completely finished are the ones that are no longer in use. • These things apply to programming as well. • However, there are guidelines, like 3 NF, which guide you towards creating good databases. • The programming guidelines that we will be following in this class fall under the heading of “Object-Oriented Programming. ” (OOP)
- Slides: 28