Logical Data Modeling Normalization BINF 697 Outline l

Logical Data Modeling – Normalization BINF 697

Outline l Logical Data Modeling – Normalization l First Normal Form l Second Normal Form l Third Normal Form BINF 697 - Edwards 2

Normalization l Rules/Conditions that every entity and its attributes must satisfy: l l …plus manipulations to “fix” violations Violation fixes can be applied iteratively, until no violations remain. Fixes usually involve creating new entities “Algorithmic” strategy to achieve a good datamodel: l l Complete, non-redundant, “business” logic, etc… Elimination of update anomalies BINF 697 - Edwards 3

Data-Model Properties l l l l Completeness Non-redundancy “Business” Logic Data Reusability Stability and Flexibility Elegance Communication Integration BINF 697 - Edwards Data Modeling Essentials (§ 1. 6) 4

Normalization l Usually normalization is applied as a series of checks after an initial data model is proposed l l l Often, primary entities of a data model are clear Provides “good-practice patterns” for data-model manipulation and representation challenges Usually do not start with a “universal” table for the entire data-model l …but one or more entities in a logical data model may need to be fixed. BINF 697 - Edwards 5

Simple Example: Employee Database Figure 2. 1: Employee Record Data Modeling Essentials, Ch 2 BINF 697 - Edwards 6

Employee Database Figure 2. 2: Employee Table l Tabular form, one fact per attribute BINF 697 - Edwards 7

Employee Database l l l Repeated attributes become new entities Use original primary key to indicate original row Be careful of data-loss! BINF 697 - Edwards 8

Employee Database l l l Identify redundant, repeated values (down the rows) Create new entities for determined values Leave only the determinant (also the primary key of new entity). BINF 697 - Edwards 9

Employee Database EMPLOYEE (Employee Number, Employee Name, Department Number*) DEPARTMENT (Department Number, Department Name, Department Location) QUALIFICATION (Employee Number*, Qualification Description, Qualification Year) l l Transformed logical data model is now in 3 rd Normal Form Simple (relational) notation l PK is underlined, FK gets * BINF 697 - Edwards 10

Extended Example Data Modeling Essentials, Ch 2 BINF 697 - Edwards 11

Initial Drug Expenditure Model l … plus lots of business logic, that I’ll explain as we go… BINF 697 - Edwards 12

Normalization l 1 st Normal Form (1 NF): l l 2 nd Normal Form (2 NF): l l l One fact per attribute Entity instances are unique No repeated attributes 1 st Normal Form, plus No partial primary-key determinants 3 rd Normal Form (3 NF): l l 2 nd Normal Form, plus No non-primary-key determinants BINF 697 - Edwards 13

1 st Normal Form (1 NF) l One fact, one attribute l l l Entity instances are unique l l Compound attributes are exploded to new columns Clean-up representations, hidden attributes, dependent attributes Every instance is uniquely defined by its primarykey No repeated attributes l Repeated attributes become instances of new entity BINF 697 - Edwards 14

One fact per attribute BINF 697 - Edwards 15

One fact per attribute BINF 697 - Edwards 16

Identify Primary-Key l Hospital Number and Operation Number uniquely identify the operation “event” l l Operation Number is managed by each Hospital independently. Hospital Number and Operation Number are underlined. BINF 697 - Edwards 17

No repeated attributes l Identify and remove groups of repeated attributes: Make new entity with repeated groups as instances Include the primary-key of the original table in the new entity’s attributes Identify the new entity’s primary-key 1. 2. 3. l Original primary-key plus…? BINF 697 - Edwards 18

No repeated attributes BINF 697 - Edwards 19

1 st Normal Form (1 NF) l l l Entity instances are unique (PK) One fact per attribute No repeated attributes BINF 697 - Edwards 20

2 nd Normal Form (2 NF) l No partial primary-key determinants l What attributes are completely determined by part of a primary-key? BINF 697 - Edwards 21

Determinants l A determinant is any identifier attribute (or set of identifier attributes) of an entity that determines other attributes’ values. l This should be true conceptually, not just for the current set of instances BINF 697 - Edwards 22

2 nd Normal Form (2 NF) l 1. No partial primary-key determinants Create new table for determinant and determined attributes, determinant as PK BINF 697 - Edwards 23

2 nd Normal Form (2 NF) l No partial primary-key determinants l What other attributes are completely determined by part of a primary-key? BINF 697 - Edwards 24

2 nd Normal Form (2 NF) BINF 697 - Edwards 25

3 rd Normal Form (3 NF) l No non-primary-key determinants l What attributes are determined by something other than the primary key of an entity? BINF 697 - Edwards 26

3 rd Normal Form (3 NF) l No non-primary-key determinants l What attributes are determined by something other than the primary key of an entity? BINF 697 - Edwards 27

3 rd Normal Form (3 NF) l No non-primary-key determinants BINF 697 - Edwards 28

3 rd Normal Form (3 NF) BINF 697 - Edwards 29

Exercise l From the course data-folder: l l l Step-by-step, normalize this universal table to 3 rd normal form. l l Drug_Expenditure_Universal. Table. xlsx Small instantiation of the data for the example Use a separate sheet for each entity Reading: Chapter 2 (DME) BINF 697 - Edwards 30

Homework l Manipulate a new universal table to 3 NF l l Submit: l l l Class_Registration_Universal. Table. xlsx a compact description (see slide 28) of the entities and their attributes an entity relationship diagram (see slide 29), a statement of the relevant business logic for each entity and its relationships, and an excel file with each entity represented as a table in a separate worksheet, populated using the data from original universal table. Due Feb 19 th, 10 am BINF 697 - Edwards 31
- Slides: 31