1 Week 10 October 31 Extended ERD Data
1 Week 10 October 31 • Extended ERD • Data Normalization R. Ching, Ph. D. • MIS • California State University, Sacramento
Problems with ER modeling • Fan traps - Pathway between two entities is ambiguous • Chasm traps - Pathway does not exist between certain entity occurrences • Inheritance - An entity receives its attributes from a class of attributes Extended Entity Relationship (ERR) modeling R. Ching, Ph. D. • MIS • California State University, Sacramento 2
Connection Trap: Fan Trap 3 What products belong to which product categories? Which products have restricted use aboard a plane? Merchandise Lines Merchandise_line Description Product Categories Classify Have Products Product_code Description Merchandise_line R. Ching, Ph. D. • MIS • California State University, Sacramento Product_category Merchandise_line
Connection Trap: Fan Trap 4 What products belong to which product categories? Which products have restricted use aboard a plane? Merchandise Lines Merchandise_line Description Product Categories Classify Product_category Merchandise_line Have Products Product_code Description Merchandise_line To satisfy these queries, we need to form a relationship R. Ching, Ph. D. • MIS • California State University, Sacramento
Connection Trap: Chasm Trap What products belong to the same merchandise line? Which products require a UL listing? Product Categories Product_category Merchandise_line Merchandise Lines Classify Merchandise_line Description UL_listing Have Products Product_code Description Product_category To satisfy these queries, we need to form a relationship Known: What merchandise lines are composed of what products R. Ching, Ph. D. • MIS • California State University, Sacramento 5
EER modeling Superclass and Subclass Entity Types 6 • Superclass - Higher order of classification or categorization • Subclass - A member of a superclass that provides specification Music and Videos Electronic Merchandise Audio CD Receiver Superclasses Visual Cassette attributes of CD attributes of cassette decks attributes of receivers R. Ching, Ph. D. • MIS • California State University, Sacramento
EER modeling Superclass and Subclass Entity Types • Specialization - top-down – Maximizing differences between members by identifying distinguishing characteristics • Generalization - bottom-up General Electronic Merchandise Audio Specific CD Receiver Visual Cassette attributes of CD attributes of cassette decks attributes of receivers R. Ching, Ph. D. • MIS • California State University, Sacramento 7
EER modeling Superclass and Subclass Entity Types • Specialization - top-down • Generalization - bottom-up – Minimizing differences between entities by identifying common features General Electronic Merchandise Audio Specific CD Receiver Visual Cassette attributes of CD attributes of cassette decks attributes of receivers R. Ching, Ph. D. • MIS • California State University, Sacramento 8
EER modeling Attribute Inheritance 9 CDP-325 (Sony CD Changer) Attributes common to all audio merchandise are inherited General Electronic Merchandise Audio Specific CD Receiver Visual Cassette attributes of CD attributes of cassette decks attributes of receivers R. Ching, Ph. D. • MIS • California State University, Sacramento
EER Diagram 10 Product_code Manufacturers 1 1 Produces M Sells d Prod_descip Products Disjoint constraint Superclass/subclass 1 CD Db range Receiver Watts R. Ching, Ph. D. • MIS • California State University, Sacramento Cassette Flutter
Constraints • Disjoint (d, o) – Entity can be a member of only one of the subclasses of specialization – Under non-disjoint, an entity can be a member of more than one subclass of specialization • Participation (partial or total) – Total - every entity in the superclass must be a member of a subclass in specialization – Partial - An entity need not belong to any of the subclasses of specialization R. Ching, Ph. D. • MIS • California State University, Sacramento 11
12 Data Normalization The process of decomposing complex data structures into simple relations according to a set of dependency rules. Mc. Fadden and Hoffer R. Ching, Ph. D. • MIS • California State University, Sacramento
Data Normalization 13 • The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise. – Achieve a design that is highly flexible – Reduce redundancy – Ensure that the design is free of certain update, insertion and deletion anomalies Catherine Richardo, 1990 R. Ching, Ph. D. • MIS • California State University, Sacramento
Normalization 14 1 NF 2 NF 3 NF BCNF 4 NF R. Ching, Ph. D. • MIS • California State University, Sacramento Progressively putting the relation into a higher normal form
Order No. Date: Stereos To Go 6 / 15 / 99 Address: Invoice John Smith 2036 -26 Street Sacramento CA 95819 City Date Shipped: Item Number Go, Hogs 0000 -0000 -0 Account No. Customer: Product Code State 0000 0 John Smith 1/05 Zip Code 6 / 18 / 99 Product Description/Manufacturer 1 SAGX 730 Pioneer Remote A/V Receiver 2 AT 10 CDPC 725 Cervwin Vega Loudspeakers Sony Disc-Jockey CD Changer 3 10001 Qty Price 1 56995 1 35995 1 39995 4 5 Subtotal Shipping & Handling Sales Tax Total R. Ching, Ph. D. • MIS • California State University, Sacramento 132985 10000 10306 153291 15
File-Based System Customer Orders Customer Accounts File Customer Mailing List File Invoice Program Customer Account Program Customer Mailings Program R. Ching, Ph. D. • MIS • California State University, Sacramento 16 Invoices Account Report Mailing List
Data Redundancy • Customer Order File – PO number – Customer account number – Customer name, address, city, state, zip code – Order date – Product code, product description, price, unit • Customer Account File – Account Number – Customer name, mailing address, city, state, zip code • Customer Mailing List File – Customer name, mailing address, city, state, zip code R. Ching, Ph. D. • MIS • California State University, Sacramento 17
Unnormalized Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item 1_descrip Item 1_qty Item 1_price, Item 2_descrip Item 2_qty Item 2_price, . . . , Item 7_descrip Item 7_qty Item 7_price) How would a program process the data to recreate the invoice? R. Ching, Ph. D. • MIS • California State University, Sacramento 18
First Normal Form (1 NF) • A relation is in first normal form if and only if every attribute is single-valued for each tuple. – Remove all repeating groups – Create a flat file R. Ching, Ph. D. • MIS • California State University, Sacramento 19
Unnormalized to 1 NF 20 (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item 1_descrip, Item 1_qty, Item 1_price, Item 2_descrip, Item 2_qty, Item 2_price, . . . , Repeating groups Item 7, Item 7_descrip, Item 7_qty, Item 7_price) A flat file places all the data of a transaction into a single record. This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement. R. Ching, Ph. D. • MIS • California State University, Sacramento
Unnormalized to 1 NF 21 (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item_descrip, Item_qty, Item_price) Nominated group of attributes to serve as the key (form a unique combination) • • • Eliminate the repeating groups. Each row retains data for one item. If a person bought 5 items, we would have five tuples R. Ching, Ph. D. • MIS • California State University, Sacramento
1 NF 22 r r e e e b m a Flat File m u n r n e nt e m c u i o to o s c v c Item Description In A Cu Item Quantity Price 10001 123456 John Smith • • • SAGX 730 Pioneer Remote A/V Rec 10001 123456 John Smith • • • AT 10 1 569. 95 Cerwin Vega Loudspeakers 1 359. 95 10001 123456 John Smith • • • CDPC 725 Sony Disc Jockey CD 1 399. 95 10001 123456 John Smith • • • S/H Shipping 1 100. 00 10001 123456 John Smith • • • Tax Sales Tax 1 103. 06 R. Ching, Ph. D. • MIS • California State University, Sacramento
Second Normal Form (2 NF) • A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key. Attribute A Attribute B Determinant • Full functional dependency – B is functionally dependent on A if each value of A is associated with exactly one value of B R. Ching, Ph. D. • MIS • California State University, Sacramento 23
2 NF • Second normal form applies to relations with composite keys (i. e. , a primary key composed of two or more attributes) • A relation with a single attribute primary key is automatically in at least 2 NF R. Ching, Ph. D. • MIS • California State University, Sacramento 24
From 1 NF to 2 NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item_descrip, Item_qty, Item_price) What attribute(s) can be used to uniquely identify a tuple? If the primary key consisted of invoice_number and item (i. e. , composite key), we would need to remove the partial dependencies. R. Ching, Ph. D. • MIS • California State University, Sacramento 25
From 1 NF to 2 NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item_descrip, Item_qty, Item_price) Using Invoice number and Item as the key. . . Some of the attributes are dependent upon invoice_number for their values and others on item. In either case, they are not functionally dependent on the entire key. R. Ching, Ph. D. • MIS • California State University, Sacramento 26
From 1 NF to 2 NF 27 Which attributes are functionally dependent on which keys? Invoice_number Vs. Item ? R. Ching, Ph. D. • MIS • California State University, Sacramento Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, tem_descrip, Item_qty, Item_price
From 1 NF to 2 NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Item, Item_descrip, Item_qty, Item_price) Is this unique by itself? What happens if the item is purchased more than once? R. Ching, Ph. D. • MIS • California State University, Sacramento 28
From 1 NF to 2 NF 29 (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) Partial dependency (Invoice_number, Item_descrip, Item_qty, Item_price) Composite key (forms a unique combination) R. Ching, Ph. D. • MIS • California State University, Sacramento
From 1 NF to 2 NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item_qty, Item_price) (Item, Item_descrip) R. Ching, Ph. D. • MIS • California State University, Sacramento 30
From 1 NF to 2 NF 31 In contrast. . . (Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item_descrip Item_qty Item_price) If the primary key consisted of invoice_number and Invoice_date (i. e. , composite key), we would NOT have partial dependencies. Thus, the relation would be in 2 NF. R. Ching, Ph. D. • MIS • California State University, Sacramento
Third Normal Form (3 NF) 32 • A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key. – Remove transitive dependencies – “Each nonkey attribute must depend upon the key, the whole key, and nothing but key. ” Kent, 1978 R. Ching, Ph. D. • MIS • California State University, Sacramento
From 2 NF to 3 NF (Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code) (Invoice_number, Item_qty, Item_price) (Item, Item_descrip) Which attributes are dependent on others? Is there a problem? R. Ching, Ph. D. • MIS • California State University, Sacramento 33
Transitive Dependencies and Anomalies • Insertion anomalies – To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries • Deletion anomalies – By deleting a row, a customer or product may cease to exist • Modification anomalies – To modify a customer’s or product’s data in one row, all modifications must be carried out to all others R. Ching, Ph. D. • MIS • California State University, Sacramento 34
Insertion and Modification Anomalies For example… 35 Insert a new Panasonic product Product_code Manufacturer_name DVD-A 110 PV-4250 Panasonic CT-32 S 35 PAN Inconsistency DVD-A 110 PV-4250 CT-32 S 35 Panasonic Pana. Sonic Pana Sonic PAN R. Ching, Ph. D. • MIS • California State University, Sacramento Change all Panasonic products’ manufacturer name to “Panasonic USA”
Deletion Anomaly For Example… 4377182 4398711 4578461 4873179 John Smith Arnold S Gray Davis Lisa Carr 36 Sacramento Davis Sacramento Reno CA CA CA NV 95831 95691 95831 89557 By deleting customer Arnold S, we would also be deleting Davis, California. R. Ching, Ph. D. • MIS • California State University, Sacramento
Transitive Dependencies A condition where A, B, C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). R. Ching, Ph. D. • MIS • California State University, Sacramento Invoice_number Invoice_date 37 Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item_descrip Invoice_number+Item_qty Item_price
Why Should City and State Be Separated from Customer Relation? • City and state are dependent on zip code for their values and not the customer’s identifier (i. e. , key). Zip_code City, State • Otherwise, Cust_account Cust_addr, Zip_code City, State In which case, you have transitive dependency. R. Ching, Ph. D. • MIS • California State University, Sacramento 38
3 NF Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item_qty, Item_price) Items Relation (Item, Item_descrip) R. Ching, Ph. D. • MIS • California State University, Sacramento 39
3 NF 40 Invoice Relation (Invoice_number, Invoice_date, Date_delivered, Cust_account) Customer Relation (Cust_account, Cust_name, Cust_addr, Zip_code) Zip_code Relation (Zip_code, City, State) Invoice_items Relation (Invoice_number, Item_qty, Item_price) Items Relation Manufacturers Relation (Item, Item_descrip) (Manuf_code, Manuf_name) Since the Items relation contains the manufacturer’s name in the description, a separate Manufacturers relation can be created R. Ching, Ph. D. • MIS • California State University, Sacramento
First to Third Normal Form (1 NF - 3 NF) • 1 NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) • 2 NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) • 3 NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies) R. Ching, Ph. D. • MIS • California State University, Sacramento 41
42 R. Ching, Ph. D. • MIS • California State University, Sacramento
- Slides: 42