Logical Data Modeling Attributes Primary Keys and Identity
Logical Data Modeling – Attributes, Primary Keys, and Identity BCHB 697
Outline l The Data Modeling Process l Entity Attributes & Types l Determinants and Primary Keys BCHB 697 - Edwards 2
The Data Modeling Process l Conceptual Data Modeling l l Logical Data Modeling l l Define the entities and their relationships Define each entities’ attributes, incl. types Choose primary key(s) for each entity Relationship details (cardinality, optional) Physical Data Modeling l Implementation BCHB 697 - Edwards 3
Data-Model Properties l l l l Completeness Non-redundancy “Business” Logic Data Reusability Stability and Flexibility Elegance Communication Integration BCHB 697 - Edwards Data Modeling Essentials (§ 1. 6) 4
Entity Attributes l Any (and all) details about an entity l l l Person: l l Will become columns in database table. Needs definition and type: string, integer, float, (date, . . . ) Optional? More than one value? One fact per attribute, no derivable values, implicit values name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone Course: l course number, department, meeting time(s), day(s), start date, end date, BCHB 697 director, syllabus, credits, - Edwards prerequisites, semester, year 5
Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 6
Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 7
Attribute Types l Basics: integer, float, string l l Sometimes: boolean, date, point (lat, long), … Missing values of optional attributes: NULL Multiple values → multiple attributes Semantic types: l l Identifier – unordered, test for equality only Category – few unordered, discrete values Numeric – ordered, arithmetic, precision (? ) Text – ordered (sortable), no arithmetic BCHB 697 - Edwards 8
Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 9
Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 10
Identifier Attributes l May be integers or strings: l l System generated: l l Sequential integer automatically generated by the database for each instance. Administrator assigned: l l How many identifiers do we need? How many digits/characters in an identifier? Manual designation (OK for a few) Externally defined: l l Explicitly provided as entity attribute value Managed by external organization / authority BCHB 697 - Edwards 11
Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 12
Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 13
Determinants l A determinant is any identifier attribute (or set of identifier attributes) of an entity that determines other attributes’ values. l l This should be true conceptually, not just for the current set of instances A candidate key of an entity is a determinant that determines all of the entity’s other attribute values. BCHB 697 - Edwards 14
Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 15
Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 16
Primary Keys l Every entity (that might be referenced) requires one candidate key be designated the primary key. l l l Primary key values are used to define entity / instance relationships l l Surrogate for instance identity Must be universal, unique, and stable. Foreign keys are identity attributes with an entity’s primary keys as values. Single attribute, integer, primary keys are usually best. BCHB 697 - Edwards 17
Course Database Attributes l Person: l l Course: l l name, family name, date of birth, age, GU ID #, Net. ID, address, email, phone course number, department, meeting time(s), day(s), start date, end date, syllabus, credits, prerequisites, semester, year Course. Participant: l person, course, role BCHB 697 - Edwards 18
Blast Database Entities and Attributes l Protein: l l Alignment: l l accession, gi, species, description, length, (sequence) bit score, E-value, query protein, reference protein High. Scoring. Pair: l ordinal, bit score, E-value, identities, positives, gaps, start position, end position, (aligned sequences), query protein, reference protein, (alignment) BCHB 697 - Edwards 19
Logical data model BCHB 594 nje 5 rcf 57 sg 1386 kg 737 yk 625 yl 1009 bm 999 ls 1340 sls 358 ss 4218 zsw 6 BCHB 580 nje 5 ker 25 sg 1386 kg 737 bh 658 yk 625 yl 1009 bm 999 war 36 ls 1340 sls 358 ss 4218 zsw 6 mdw 83 yw 575 my 511 BCHB 697 nje 5 sg 1386 bh 658 xh 61 yk 625 yl 1009 bm 999 ls 1340 ss 4218 lmw 116 zsw 6 yw 575 Course course_id department number semester year l l Course. Participant 1 1 course_id person_id course person role Person person_id name dob GU ID # Net. ID Entities: Course, Person, Course. Participant Relationships: Course ← Course. Participant, Person ← Course. Participant BCHB 697 - Edwards 20
Exercise l Navigate to a bioinformatics knowledgebase l Uni. Prot, db. SNP, Clin. Var, PDB, … l Identify the entities and relationships Identify the attributes of each entity Choose a primary key for each entity l Reading: Chapters (4), 5, 6 (DME) l l BCHB 697 - Edwards 21
- Slides: 21