Logical Data Modeling Attributes Primary Keys and Identity

Logical Data Modeling – Attributes, Primary Keys, and Identity BCHB 697

Outline l The Data Modeling Process l Entity Attributes & Types l Determinants and Primary Keys BCHB 697 - Edwards 2

The Data Modeling Process l Conceptual Data Modeling l l Logical Data Modeling l l Define the entities and their relationships Define each entities’ attributes, incl. types Choose primary key(s) for each entity Relationship details (cardinality, optional) Physical Data Modeling l Implementation BCHB 697 - Edwards 3

Data-Model Properties l l l l Completeness Non-redundancy “Business” Logic Data Reusability Stability and Flexibility Elegance Communication Integration BCHB 697 - Edwards Data Modeling Essentials (§ 1. 6) 4

Entity Attributes l Any (and all) details about an entity l l l Person: l l Will become columns in database table. Needs definition and type: string, integer, float, (date, . . . ) Optional? More than one value? One fact per attribute, no derivable values, implicit values name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone Course: l course number, department, meeting time(s), day(s), start date, end date, BCHB 697 director, syllabus, credits, - Edwards prerequisites, semester, year 5

Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 6

Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 7

Attribute Types l Basics: integer, float, string l l Sometimes: boolean, date, point (lat, long), … Missing values of optional attributes: NULL Multiple values → multiple attributes Semantic types: l l Identifier – unordered, test for equality only Category – few unordered, discrete values Numeric – ordered, arithmetic, precision (? ) Text – ordered (sortable), no arithmetic BCHB 697 - Edwards 8

Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 9

Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 10

Identifier Attributes l May be integers or strings: l l System generated: l l Sequential integer automatically generated by the database for each instance. Administrator assigned: l l How many identifiers do we need? How many digits/characters in an identifier? Manual designation (OK for a few) Externally defined: l l Explicitly provided as entity attribute value Managed by external organization / authority BCHB 697 - Edwards 11

Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 12

Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 13

Determinants l A determinant is any identifier attribute (or set of identifier attributes) of an entity that determines other attributes’ values. l l This should be true conceptually, not just for the current set of instances A candidate key of an entity is a determinant that determines all of the entity’s other attribute values. BCHB 697 - Edwards 14

Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 15

Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 16

Primary Keys l Every entity (that might be referenced) requires one candidate key be designated the primary key. l l l Primary key values are used to define entity / instance relationships l l Surrogate for instance identity Must be universal, unique, and stable. Foreign keys are identity attributes with an entity’s primary keys as values. Single attribute, integer, primary keys are usually best. BCHB 697 - Edwards 17

Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 18

Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 19

Logical data model BCHB 594 nje 5 rcf 57 sg 1386 kg 737 yk 625 yl 1009 bm 999 ls 1340 sls 358 ss 4218 zsw 6 BCHB 580 nje 5 ker 25 sg 1386 kg 737 bh 658 yk 625 yl 1009 bm 999 war 36 ls 1340 sls 358 ss 4218 zsw 6 mdw 83 yw 575 my 511 BCHB 697 nje 5 sg 1386 bh 658 xh 61 yk 625 yl 1009 bm 999 ls 1340 ss 4218 lmw 116 zsw 6 yw 575 Course course_id department number semester year l l Course. Participant 1 1 course_id person_id course person role Person person_id name dob GU ID # Net. ID Entities: Course, Person, Course. Participant Relationships: Course ← Course. Participant, Person ← Course. Participant BCHB 697 - Edwards 20

Exercise l Navigate to a bioinformatics knowledgebase l Uni. Prot, db. SNP, Clin. Var, PDB, … l Identify the entities and relationships Identify the attributes of each entity Choose a primary key for each entity l Reading: Chapters (4), 5, 6 (DME) l l BCHB 697 - Edwards 21
- Slides: 21