Database Processing Database Design Using Normalization Chapter 4
Database Processing Database Design Using Normalization Chapter 4 David M. Kroenke DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 1
Chapter Premise • How to assess structure of table? • Industry generally normalizes to 3 NF – Is normalized data always necessary? • How to address problems that often arise? DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 2
Assessing Table Structure DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 3
Counting Rows in a Table • To count the number of rows in a table use the SQL built-in function COUNT(*): SELECT FROM COUNT(*) AS Num. Rows SKU_DATA; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 4
Examining the Columns • To determine the number and type of columns in a table, use Describe: DESCRIBE SKU_DATA; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 5
Checking Validity of Assumed Referential Integrity Constraints • Given two tables with an assumed foreign key constraint: SKU_DATA (SKU, SKU_Description, Department, Buyer) BUYER (Buyer. Name, Department) Where SKU_DATA. Buyer must exist in BUYER. Buyer. Name DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 6
Checking Validity of Assumed Referential Integrity Constraints • To find any foreign key values that violate the foreign key constraint: SELECT Buyer FROM SKU_DATA WHERE Buyer NOT IN (SELECT Buyer FROM SKU_DATA, BUYER WHERE SKU_DATA. BUYER = BUYER. Buyer. Name; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 7
Type of Database • Updateable database or read-only database? • If updateable database, we normally want tables in 3 NF • If read-only database, we may not need 3 NF tables DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 8
Designing Updateable Databases DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 9
Normalization: Advantages and Disadvantages DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 10
Non-Normalized Table: EQUIPMENT_REPAIR Item. Number Type, Acquisition. Cost Item. Number, Repair. Number Repair. Date, Repair. Amount Key: Item. Number, Repair. Number DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 11
Normalized Tables: ITEM and REPAIR Relation DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 12
Copying Data to New Tables • To copy data from one table to another, use the SQL command INSERT INTO Table. Name command: INSERT INTO ITEM SELECT DISTINCT Item. Number, Type, Acquisition. Cost FROM EQUIPMENT_REPAIR; INSERT INTO REPAIR SELECT Item. Number, Repair. Date, Repair. Ammount FROM EQUIPMENT_REPAIR; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 13
Choosing Not to Use 3 NF • 3 NF is used to control anomalies from functional dependencies • There are times when 3 NF is not desirable • The classic example is ZIP codes: – ZIP codes almost never change – Any anomalies are likely to be caught by normal business practices – Not having to use SQL to join data in two tables will speed up application processing DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 14
Multivalued Dependencies • Anomalies from multivalued dependencies are very problematic – E. g. Employee Degree • Always place the columns of a multivalued dependency into a separate table (4 NF) DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 15
Read-Only Databases • Read-only databases are non-operational databases using data extracted from operational databases • They are used for querying, reporting and data mining applications • They are not updated (in the operational database sense) but have new data imported periodically DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 16
Denormalization • For read-only databases, normalization may not be an advantage – Application processing speed is more important • Denormalization is the joining of data in normalized tables prior to storing the data • The data is then stored in non-normalized tables DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 17
Normalized Tables DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 18
Denormalizing the Data INSERT INTO PAYMENT_DATA SELECT STUDENT. SID, Name, CLUB. Club, Cost, Amt. Paid FROM STUDENT, PAYMENT, CLUB WHERE STUDENT. SID = PAYMENT. SID AND PAYMENT. Club = CLUB. Club; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 19
Customized Tables • Read-only databases are often designed with many copies of the same data, but with each copy customized for a specific application • Consider the PRODUCT table: DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 20
Customized Tables One Table for Purchasing: PRODUCT_PURCHASING (SKU, SKU_Description, Vendor. Number, Vendor. Name, Vendor. Contact_1, Vendor. Contact_2, Vendor. Street, Vendor. City, Vendor. State, Vendor. Zip) Another for Sales: PRODUCT_USAGE (SKU, SKU_Description, Quantity. Sold. Past. Year, Quantity. Sold. Past. Quarter, Quantity. Sold. Past. Month) Another for Inventory: PRODUCT_INVENTORY (SKU, Part. Number, SKU_Description, Units. Code, Bin. Number, Production. Key. Code) DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 21
Common Design Problems DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 22
The Multivalue, Multicolumn Problem • The multivalue, multicolumn problem occurs when multiple values of an attribute are stored in more than one column: EMPLOYEE (Emp. Number, Name, Email, Auto 1_License. Number, Auto 2_License. Number, Auto 3_License. Number) • This is another form of a multivalued dependency • Solution: Like the 4 NF solution for multivalued dependencies, use a separate table to store the multiple values DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 23
Inconsistent Values • Inconsistent values occur when different users or different data sources use slightly different forms of the same data value: – Different codings: • SKU_Description = 'Corn, Large Can' • SKU_Description = 'Can, Corn, Large' • SKU_Description = 'Large Can Corn‘ – Different spellings: • Coffee, Coffeee DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 24
Inconsistent Values • Particularly problematic are primary or foreign key values • To detect: – Use referential integrity check already discussed for checking keys – Use the SQL GROUP BY clause on suspected columns SELECT FROM GROUP BY SKU_Description, COUNT(*) AS Name. Count SKU_DATA SKU_Description; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 25
Missing Values • A missing value or null value is a value that has never been provided DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 26
Null Values are Ambiguous May indicate that a value is inappropriate: • Date. Of. Last. Childbirth is inappropriate for a male May indicate that a value is appropriate but unknown • Date. Of. Last. Childbirth is appropriate for a female, but may be unknown May indicate that a value is appropriate and known, but has never been entered: • Date. Of. Last. Childbirth is appropriate for a female, and may be known but no one has recorded it in the database DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 27
Checking for Null Values • Use the SQL keyword IS NULL to check for null values: SELECT FROM WHERE COUNT(*) AS Quantity. Null. Count ORDER_ITEM Quantity IS NULL; DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 28
The General-Purpose Remarks Column • A general-purpose remarks column is a column with a name such as: – Remarks, Comments and Notes • It often contains important data stored in an inconsistent, verbal and verbose way – A typical use is to store data on a customer’s interests. • Such a column may: – Be used inconsistently – Hold multiple data items DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 29
Remarks Column Solution • Identify different uses for Remarks, then create a column for each use • Very difficult to do in practice – Requires agreement between all users – Requires consistent application DAVID M. KROENKE’S DATABASE PROCESSING, 10 th Edition © 2006 Pearson Prentice Hall 30
- Slides: 30