More On ERD and Concept of Normalization in

  • Slides: 41
Download presentation
More On ERD and Concept of Normalization in Relational Data Model Sahar Mosleh California

More On ERD and Concept of Normalization in Relational Data Model Sahar Mosleh California State University San Marcos Page 1

Another Example Of Changing ERD to tables Sahar Mosleh California State University San Marcos

Another Example Of Changing ERD to tables Sahar Mosleh California State University San Marcos Page 2

 • So far we learned how to develop an ERD for an organization

• So far we learned how to develop an ERD for an organization • We also found how ERD can be changed to initial tables • In developing a database, do all developers come up with the same ERD? • Depending on assumptions, and style of development, it is very likely for two developers of the same database to come up with two different ERDs • Your ERD does not have to be the best ERD that can possibly be created • However, you need to do your best to relate entities properly and place attributes where they may fit the best • No matter how well-structured your ERD is, your initial tables produced from your ERD can potentially contain redundancy. • Redundancy can cause insertion, deletion, and update problems. Sahar Mosleh California State University San Marcos Page 3

 • Assume a developer plans to develop a simple database for computer science

• Assume a developer plans to develop a simple database for computer science department. • The developer’s ERD looks as follows: a single entity with some attributes. Course Name Dept Emp. ID Salary Date. Course. Completed Computer. Science Sahar Mosleh California State University San Marcos Page 4

 • He changes his ERD to a table and inserts some data into

• He changes his ERD to a table and inserts some data into the table. Therefore, his database contains only one table and here it is. • What is the problem with this database? • What is the primary key? What is/are the foreign key(s)? • Can you see any redundancy in this table? Emp. ID Course Name Dept Salary Date. Completed 100 Database Margaret Marketing 42, 000 6/19/2000 100 VB Margaret Marketing 42, 000 10/7/2000 140 C++ Nancy Accounting 39, 000 12/8/2000 140 Java Nancy Accounting 39, 000 1/30/2001 110 Database Chris Info System 41, 500 5/7/2000 190 ASP Steve Finance 38, 000 6/19/2000 150 VB Joe Marketing 38, 000 8/12/2000 Sahar Mosleh California State University San Marcos Page 5

Well-Structured table • Definition: It is a table that contains a minimum amount of

Well-Structured table • Definition: It is a table that contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. • Consider this table: Emp. ID Course Name Dept Salary Date. Completed 100 Database Margaret Marketing 42, 000 6/19/2000 100 VB Margaret Marketing 42, 000 10/7/2000 140 C++ Nancy Accounting 39, 000 12/8/2000 140 Java Nancy Accounting 39, 000 1/30/2001 110 Database Chris Info System 41, 500 5/7/2000 190 ASP Steve Finance 38, 000 6/19/2000 150 VB Joe Marketing 38, 000 8/12/2000 Sahar Mosleh California State University San Marcos Page 6

Anomalies • Definition: Errors or inconsistencies that may result when a user attempts to

Anomalies • Definition: Errors or inconsistencies that may result when a user attempts to update a table that contains redundant data. There are three types of anomalies: insertion, deletion, and modification anomalies. • Insertion Anomaly: • Every time you try to input the personal information of an employee into the employee table, you also need to insert the information about the course he has completed. • What if we have an employee that has not assigned any course? • Can we add personal information and add the course information later? • No, because Emp. ID and Course (together) make the primary key and primary key cannot be NULL. Sahar Mosleh California State University San Marcos Page 7

 • Deletion Anomaly: • Suppose that ASP is no longer used by the

• Deletion Anomaly: • Suppose that ASP is no longer used by the company. They want to delete all information regarding ASP • If we delete ASP, we lose information about employee 190. • Modification Anomaly: • Suppose that employee number 100 gets a salary increase. You must record the increase in each of the rows for that employee (two occurrences in this case); otherwise, the data is inconsistent. Sahar Mosleh California State University San Marcos Page 8

Concept of Normalization • In order to take care of the anomalies, we need

Concept of Normalization • In order to take care of the anomalies, we need to normalize the tables. • Normalization: • Definition: The process of converting complex data structures into simple, stable data structures. • Normal Form: • Definition: A state of a table that can be determined by applying simple rules regarding dependencies to that table • In order to ensure that our tables have no serious anomalies, they all have to be in third normal form. • Before explaining different normal forms, we need to understand the concept of functional dependency. Sahar Mosleh California State University San Marcos Page 9

Functional Dependencies • Definition: If an attribute of a table (Employee Name) can be

Functional Dependencies • Definition: If an attribute of a table (Employee Name) can be determined from another attribute in the same table (Employee Number), the Name is functionally dependant on the Number. • Common examples of functional dependencies are: • SSN determines Name, Address and Birthday, • A person’s name, address, and birthday are functionally dependent on that person’s Social Insurance number • VIN determines Make, Model and Color • The make, model, and color of a vehicle are functionally dependent on the vehicle identification number, • ISBN determines Title • The title of the book is functionally dependent on the book’s international standard book number (ISBN) Sahar Mosleh California State University San Marcos Page 10

First Normal Form • Definition: A table is in first normal form if each

First Normal Form • Definition: A table is in first normal form if each field (intersection of a row and a column) contains only one instance • Having more than one data item in a field is called ‘Repeating Groups’ • For example the following table is not in first normal form. Emp. ID Course 100 Salary Date. Completed Database Margaret Marketing VB 42, 000 6/19/2000 10/7/2000 140 C++ Java Accounting 39, 000 12/8/2000 1/30/2001 110 Database Chris Info System 41, 500 5/7/2000 190 ASP Steve Finance 38, 000 6/19/2000 150 VB Joe Marketing 38, 000 8/12/2000 Sahar Mosleh Name Nancy Dept California State University San Marcos Page 11

 • The simplest method to make a table into the first normal form

• The simplest method to make a table into the first normal form is to expand the number of rows in the table to ensure that the value of each attribute is atomic. • Thus, to make our table into the first normal formal we have to create two extra rows Emp. ID Course Name Dept Salary Date. Completed 100 Database Margaret Marketing 42, 000 6/19/2000 100 VB Margaret Marketing 42, 000 10/7/2000 140 C++ Nancy Accounting 39, 000 12/8/2000 140 Java Nancy Accounting 39, 000 1/30/2001 110 Database Chris Info System 41, 500 5/7/2000 190 ASP Steve Finance 38, 000 6/19/2000 150 VB Joe Marketing 38, 000 8/12/2000 Sahar Mosleh California State University San Marcos Page 12

Second Normal Form • Definition: A table is in second normal form if it

Second Normal Form • Definition: A table is in second normal form if it is in first normal form and non-key attributes don’t have partial functional dependency. • Partial functional dependency: • Definition: A dependency in which one or more non-key attributes are functionally dependent on part, but not on all of the primary key. • Note that, a table in first normal form is also in second normal form if • The primary key consists of only one attribute (column), or • No non-key attribute exists, or • Every non-key attribute is functionally dependent on the full primary key. Sahar Mosleh California State University San Marcos Page 13

Emp. ID Course Name Dept Salary Date. Completed 100 Database Margaret Marketing 42, 000

Emp. ID Course Name Dept Salary Date. Completed 100 Database Margaret Marketing 42, 000 6/19/2000 100 VB Margaret Marketing 42, 000 10/7/2000 140 C++ Nancy Accounting 39, 000 12/8/2000 140 Java Nancy Accounting 39, 000 1/30/2001 110 Database Chris Info System 41, 500 5/7/2000 190 ASP Steve Finance 38, 000 6/19/2000 150 VB Joe Marketing 38, 000 8/12/2000 • In the above table the primary key is “Emp. ID and Course”. • However, the Emp. ID can functionally determine Name, Dept, and Salary. Thus, Name, Dept, and Salary are only dependent on Emp. ID that is part of the primary key. • Emp. ID and Course functionally determine the Date. Completed. • Therefore, the above table is not in second normal form. Sahar Mosleh California State University San Marcos Page 14

 • In order to solve this problem and making the table into second

• In order to solve this problem and making the table into second normal form we can split the table into two new tables • Note that, if you split a table into two tables or more, they still have to be connected through primary and foreign key • The following two tables are still related through the Emp. Id attribute Salary Course Date Completed 100 Database 6/19/2000 Emp ID Name 100 Margaret Marketing 42, 000 100 VB 10/7/2000 140 Nancy Accounting 39, 000 140 C++ 12/8/2000 110 Chris Info System 41, 500 140 Java 1/30/2001 190 Steve Finance 38, 000 110 Database 5/7/2000 150 Joe Marketing 38, 000 190 ASP 6/19/2000 150 VB 8/12/2000 Sahar Mosleh Dept Emp ID California State University San Marcos Page 15

Third Normal Form • Definition: A table is in third normal form if it

Third Normal Form • Definition: A table is in third normal form if it is in second normal form and neither Derived Dependencies nor Transitive Dependencies exist. • Derived Dependency: • Definition: A table has a Derived Dependency if one or more non-key attribute(s) in that table can be calculated (derived) from other non-key attributes in either the same table or in other tables in the database • Transitive dependency: • Definition: A table has a Transitive Dependency if one or more non-key attribute(s) depends on another non-key attribute (in the same table) Sahar Mosleh California State University San Marcos Page 16

Example of Derived Dependency Item Qty Price Number Ordered Per. Unit Total 12341 4

Example of Derived Dependency Item Qty Price Number Ordered Per. Unit Total 12341 4 $20 $80 12342 3 $15 $45 12343 5 $25 $125 • The “Total” column can be derived from multiplying “Qty. Ordered” and “Price. Per. Unit” column. Therefore, it can be removed Sahar Mosleh Item Qty Number Ordered Price Per. Unit 12341 4 $20 12342 3 $15 12343 5 $25 California State University San Marcos Page 17

Example of Transitive Dependency • The following is not in 3 rd normal form.

Example of Transitive Dependency • The following is not in 3 rd normal form. Why? Cust. ID Cust. Name Sales. Rep. ID Sales. Rep Region 8023 Margaret 101 Smith South 9167 Jim 102 Hicks West 7924 Nancy 101 Smith South 6837 Joe 103 Hernandez East 8596 Chris 102 Hicks West 7018 Steve 104 Faulb North • Answer: The reason is that the non-key attribute “Sales. Rep. ID” can determine the Region of the Sales. Rep. • Therefore, since transitive dependency exits, the above table is not in third normal form Sahar Mosleh California State University San Marcos Page 18

 • Note that this table is in second normal form because the primary

• Note that this table is in second normal form because the primary key is a single attribute(Cust. ID). • However, the following anomalies still exists: • Insertion Anomaly: A new salesperson assigned to the North region cannot be entered until a customer has been assigned to that sales person • Deletion Anomaly: If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned to the East region • Modification Anomaly: If salesperson Smith is assigned to the East region, several rows must be changed to reflect that fact. Sahar Mosleh California State University San Marcos Page 19

 • Since a Primary Key is unique, using a name is never a

• Since a Primary Key is unique, using a name is never a good idea. It is better to add a new attribute (numeric) and use it instead. Cust. ID Cust. Name ID 8023 Margaret 101 ID Sales. Rep. Region 9167 Jim 102 101 Smith South 7924 Nancy 101 6837 Joe 103 102 Hicks West 8596 Chris 102 103 Hernandez East 7018 Steve 104 Faulb North Foreign key and Primary key relationship Sahar Mosleh California State University San Marcos Page 20

Customer Number Order Number Customer Name Customer Address Orders 1 Order Date has Part.

Customer Number Order Number Customer Name Customer Address Orders 1 Order Date has Part. Number N Details Part Description Sahar Mosleh Part. Cost Per. Unit California State University San Marcos Quantity Ordered Total Price Page 21

Another Example • Consider the following tables that is generated from this ERD. The

Another Example • Consider the following tables that is generated from this ERD. The goal is to make the tables into third normal form. Orders Order Number Customer Name Customer Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 Details Order Part. Cost Number Description Per. Unit Quantity Total Ordered Price 12341 2, 5, 6 Sahar Mosleh X 1, X 2, X 3 Part 1, Part 2, Part 3 $12, $15, $16 California State University San Marcos $24, $75, $96 Page 22

First normal form -- Remove repeating groups. • Question: Are the two tables, Orders

First normal form -- Remove repeating groups. • Question: Are the two tables, Orders and Details, in first normal form? • Orders table is in first normal form because the value for each field is atomic • However, Details Table is not in first normal form because some fields have more than one value. • To put Details Table in first normal form, we need to remove repeating group in Details table as follows: Details Order Part. Cost Number Description Per. Unit Quantity Total Ordered Price 12341 X 1 Part 1 $12 2 $24 12341 X 2 Part 2 $15 5 $75 12341 X 3 Part 3 $16 6 $96 Sahar Mosleh California State University San Marcos Page 23

Second normal form - Remove partial dependencies • Question: Are the two tables, Orders

Second normal form - Remove partial dependencies • Question: Are the two tables, Orders and Details, in Second normal form? • The Orders table has a single attribute primary key, (order. Number) so it is in 2 nd normal form already Order Number Customer Name Customer Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 Sahar Mosleh California State University San Marcos Page 24

 • However, Details Table is not in Second Normal form. Why? Order Part.

• However, Details Table is not in Second Normal form. Why? Order Part. Cost Number Description Per. Unit Quantity Total Ordered Price 12341 X 1 Part 1 $12 2 $24 12341 X 2 Part 2 $15 5 $75 12341 X 3 Part 3 $16 6 $96 • Note that Details Table has a multiple Attribute primary key (Order Number; Part Number) and there is a partial dependency in this table: • “Part. Description” and “Part. Cost. Per. Unit” are dependant of Part Number • We can separate “Part. Description” and “Part. Cost. Per. Unit” from the Details table and place them in a new table. Sahar Mosleh California State University San Marcos Page 25

The old Details Table Order Part. Cost Number Description Per. Unit Quantity Total Ordered

The old Details Table Order Part. Cost Number Description Per. Unit Quantity Total Ordered Price 12341 X 1 Part 1 $12 2 $24 12341 X 2 Part 2 $15 5 $75 12341 X 3 Part 3 $16 6 $96 Details Parts Order Part Quantity Total Number Ordered Price Part. Cost Number Description Per. Unit 12341 X 1 2 $24 X 1 Part 1 $12 12341 X 2 5 $75 X 2 Part 2 $15 12341 X 3 6 $96 X 3 Part 3 $16 Foreign key and Primary key relationship Sahar Mosleh California State University San Marcos Page 26

Third Normal Form -- Remove Derived Dependency and Transitive Dependency • The Details table

Third Normal Form -- Remove Derived Dependency and Transitive Dependency • The Details table has a column with derived dependency • The “Total. Payable” can be derived from multiplying “Quantity. Ordered” in Details table by the “Part. Cost. Per. Unit” in the parts table • Total. Payable can be removed from the table. • Here is what we get: Total. Payable = Quantity. Ordered * Part. Cost. Per. Unit Details Parts Order Part Quantity Number Ordered Part. Cost Number Description Per. Unit 12341 X 1 2 X 1 Part 1 $12 12341 X 2 5 X 2 Part 2 $15 12341 X 3 6 X 3 Part 3 $16 Sahar Mosleh California State University San Marcos Page 27

 • Do you see anymore derived dependency or transitive dependency in the following

• Do you see anymore derived dependency or transitive dependency in the following tables? Details Order Number Part Number 12341 Quantity Ordered Parts Part Number Part Description Part. Cost Per. Unit X 1 2 X 1 Part 1 $12 12341 X 2 5 X 2 Part 2 $15 12341 X 3 6 X 3 Part 3 $16 • Is there any more derived dependency in any of the non-key attributes? • To have a transitive dependency in a table, we need to have at least two non-key attributes. Therefore, Details table cannot have transitive dependency either. • In the Parts table the description and cost columns cannot functionally determine each other. • Therefore, Details table and Parts table are in third normal form Sahar Mosleh California State University San Marcos Page 28

 • How about the Orders table? Orders Order Number Customer Name Customer Address

• How about the Orders table? Orders Order Number Customer Name Customer Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. 12/12/2000 • The Orders table contains Transitive data. Why? • The customer name and address depend on Customer Number. Note that “Customer. Number” is a non-key attribute • We can place a customer with its name and address in a separate table and remove it from the Orders table. Sahar Mosleh California State University San Marcos Page 29

The old Orders Table Order Number Customer Name Customer Address Order Date 12341 121

The old Orders Table Order Number Customer Name Customer Address Order Date 12341 121 Bob Smith 52 Clark Str. 12/10/2000 12342 122 Joe Clark 62 Park Ave 12/01/2001 12343 123 John Wong 72 Bay Str. Orders 12/12/2000 Customers Order Customer Order Number Date Customer Number Name Customer Address 12341 12/10/2000 121 Bob Smith 52 Clark Str. 12342 12/01/2001 122 Joe Clark 62 Park Ave 12343 12/12/2000 123 John Wong 72 Bay Str. Primary key and foreign key relationship Sahar Mosleh California State University San Marcos Page 30

 • What we are left with is 4 tables • These table can

• What we are left with is 4 tables • These table can be used to represent any order we want Customers Order Customer Order Number Date Customer Number Name Customer Address 12341 12/10/2000 121 Bob Smith 52 Clark Str. 12342 12/01/2001 122 Joe Clark 62 Park Ave 12343 12/12/2000 123 John Wong 72 Bay Str. Details Parts Order Number Part Number Quantity Ordered Part Number Part Description Part. Cost Per. Unit 12341 X 1 2 X 1 Part 1 $12 12341 X 2 5 X 2 Part 2 $15 12341 X 3 6 X 3 Part 3 $16 Sahar Mosleh California State University San Marcos Page 31

Revising ERD • Based on your normalization, you should revise your ERD Customer Number

Revising ERD • Based on your normalization, you should revise your ERD Customer Number Customer Address Order Number Customer Name 1 CUSTOMER M MAKES ORDER Quantity Ordered Part Number Part Description Order Date PART L M INCLUDES Part. Cost. Per. Unit Sahar Mosleh California State University San Marcos Page 32

Some Comments • The normalization process can be complicated by the ERD you start

Some Comments • The normalization process can be complicated by the ERD you start with • In the order example, you should know right away that a Customer should have a table of its own • How you do the normalization is not important. As long as the data reaches Third Normal Form is all that matters • When normalizing data, there is no ‘Right’ way to do it. Some say there is no ‘wrong’ way either. It depends on the data you are working with Sahar Mosleh California State University San Marcos Page 33

 • Sometimes, it even makes sense to ‘not’ fully normalize • EX. You

• Sometimes, it even makes sense to ‘not’ fully normalize • EX. You calculate a ‘total’ several times a day, and each time it takes 5 minutes. This total changes at most once a day. • It is faster to save the total, and update it when it changes than to recalculate it every time • You should go through the normalization process to be sure though Sahar Mosleh California State University San Marcos Page 34

Given This data, Derive normalized tables and ERD Cust ID Prod ID Order No

Given This data, Derive normalized tables and ERD Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc Price Date Order QTY Order Total Assumptions: • Each customer can make several orders; but a particular order is only make for one customer • Each product can be in many orders; and each order can contain many products • Several invoices can be issues for the same order but each invoice refers to one order Sahar Mosleh California State University San Marcos Page 35

Cust ID Prod ID Order No Prod. Id Inv No Cust Prod Order Name

Cust ID Prod ID Order No Prod. Id Inv No Cust Prod Order Name Add Desc Price Date Order QTY Order Total Prod. Price Prod. Desc Prod. Id Prod. Desc, Prod. Price PRODUCT (Prod. Id, Prod. Desc, Prod. Price, …. ) PRODUCT Sahar Mosleh California State University San Marcos Page 36

Cust. Id Cust. Name, Cust. Add Customer (Cust. Id, Cust. Name, Cust. Add, …)

Cust. Id Cust. Name, Cust. Add Customer (Cust. Id, Cust. Name, Cust. Add, …) Cust ID Prod ID Order No Prod. Id Inv No Prod. Price Prod. Desc PRODUCT Sahar Mosleh Cust Prod Order Name Add Desc Price Date Cust. Id Order QTY Order Total Cust. Add Cust. Name CUSTOMER California State University San Marcos Page 37

Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc

Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc Price Date Cust. Id Prod. Price PRODUCT N 1 M N ORDER CUSTOMER Order. No Order. Date, Order. Total Order. Date Sahar Mosleh Cust. Add IS-For Is-In Order. No Order Total Cust. Name Prod. Desc Prod. Id Order QTY Order (Order. No, Order. Date, Order. Total, … ) Order. Total California State University San Marcos Page 38

Invoice (Inv. No, …. ) Cust ID Prod ID Order No Inv No Cust

Invoice (Inv. No, …. ) Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc Price Date Cust. Id Prod. Price PRODUCT N 1 CUSTOMER N M ORDER Has 1 Order. Date Sahar Mosleh Cust. Add IS-For Is-In Order. No Order Total Cust. Name Prod. Desc Prod. Id Order QTY N INVOICE Inv. No Order. Total California State University San Marcos Page 39

Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc

Cust ID Prod ID Order No Inv No Cust Prod Order Name Add Desc Price Date PRODUCT N 1 Order. QTY Is-In Has 1 Order. Date Cust. Add CUSTOMER IS-For N M ORDER Sahar Mosleh Cust. Id Prod. Price Order. No Order Total Cust. Name Prod. Desc Prod. Id Order QTY 1 INVOICE Inv. No Order. Total California State University San Marcos Page 40

All Tables: Product. Ordered(Prod. Id, Order. No, Order. QTY) Invoice (Inv. No, Order. No)

All Tables: Product. Ordered(Prod. Id, Order. No, Order. QTY) Invoice (Inv. No, Order. No) Order (Order. No, Order. Date, Order. Total, Cust. Id) Customer (Cust. Id, Cust. Name, Cust. Add) Product (Prod. Id, Prod. Desc, Prod. Price) Sahar Mosleh California State University San Marcos Page 41