Relational Theory Relational Theory l l l History




































- Slides: 36
Relational Theory
Relational Theory l l l History of Relational Model Set Theory Relational Concepts and Terminology Keys Relational Integrity
History of Relational Model l E. F. Codd (Researcher at IBM) l l l Wanted to inject scientific rigor and solid principles into the field of database management Wanted to address problems of file-based systems Developed an abstract theory of data based on mathematics l l Conceptually straightforward Mathematical basis for “treating derivability, redundancy, and consistency of relations”
Codd (1970) l “… the problems treated here are those of data independence - the independence of application programs and terminal activities from growth in data types and changes in data representation; and certain kinds of data inconsistency which are expected to become troublesome…” (Codd, 1970, p 377) E. F. Codd (1970). A Relational Model of Data for Large Shared Data Banks. Communications of the ACM (13) 6, 377 -387.
Codd (1970) l provides a means of describing data with its natural structure only--that is, without superimposing any additional structure for machine representation purposes. (Codd, 1970, p 377)
Codd’s Rules Set of rules against which you can test a relational system 1. 2. 3. 4. All data represented as tables Logical representation independent from physical storage Single high-level language to build tables, insert/update/delete data Support relational operations (select, project, join) and set operations (union, intersect, difference, division)
Codd’s Rules 5. 6. 7. 8. Support views (alternate ways of looking at tables) Differentiate between unknown (null) and zero Mechanisms to support security and authorization Protect data integrity through transactions and recovery procedures
Date’s Short Explanation l CJ Date (2000) informally explains a relational DB as one that: l l Represents data as tables Supports the relational process (selection, projection, join)
Data Independence l l l Stored (physical) order of records does not need to be the same as the presentation (logical, temporary) order of records. Indexes should be able to be created/destroyed as performance needs dictate Important difference! File based systems and network/hierarchical models relied on physical “pointers” to link records. Relational model uses logical pointers.
The underlying math
Set Theory Set of all possible relations u u r(R) Í (dom(A 1) X dom (A 2) X. . . X dom (An)) All possible sets of values for a relation
Set Theory l l Sets are items that are grouped together based on common characteristics The set can be treated as a unit— mathematical operations can be performed on the set
Set Theory l l Sets: collections of objects that have similar properties Objects can be anything…people, allergies, cars, even other sets! l l Example: 4 could be in the set of “even numbers”, which is part of the set called “integers” Because objects in a set have shared properties, the set collectively can be mathematically manipulated as a unit (relational algebra).
Relational Terminology l l Relation: table with columns and rows, representing a conceptual entity (Vehicle) Attribute: named column that represents a characteristic of the relation (Color) Domain: set of allowable values for an attribute (red, blue, green) Tuple: a row, representing a concrete object (my car)
Word usage - caution l The same/similar words are used in the mathematical theory, and in conceptual phase of DB design (ERD model) but the words are used in different ways
Word usage l l “Relation” is mathematical concept analogous to a table. “Relationship” is both mathematical term and a conceptual design term. In this class we’ll primarily use the conceptual model definition: interaction between two entities.
Word Usage Mathematical l Degree: number of columns in a table l Cardinality: number of rows in a table Conceptual Model l Degree: number of tables that interact in a relationship l Cardinality: number of rows participating in a relationship (0, 1, many)
What things are called… Common (Perceived as) Relational (Math) Physical Conceptual Table Relation File Entity (Entity type) Row Tuple Record Instance Column Attribute Field Attribute
Cartesian Product l Suppose you have two sets D 1 = {2, 4} D 2 = {1, 3, 5} l D 1 x D 2 = Cartesian product ={(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)} l l l Every element from D 1 matched with every element from D 2 The product, and its subsets, are also relations. We usually choose a specific subset in SQL queries.
Predicate Algebra l l Based on a “truth value” (A form of boolean logic…A statement can be True/False) Objects have properties. We can make inferences based on properties Socrates is a man (T). All men are mortal (T). Therefore Socrates is mortal (T).
Tables are at the heart of all database processes. At the intersection of a row and column is only ONE value!
More about tables l Data are perceived as stored in tables l l l Each row represents one instance Each row is distinct and can be identified as unique via one or more attributes Data are actually stored in whatever manner the software desires l l l Ordering of rows is immaterial Ordering of columns is immaterial Data on disk don’t really look like a table
Properties of Relations l l Each relation (table name) is distinct Each tuple (row) is distinct; there are no duplicates l l Each attribute in a table has a distinct name within that table l l Remember: The order of rows has no significance You might have the same attribute name in different tables, though The order of attributes has no significance
Properties of Relations l Each value of an attribute comes from the same domain l l domain is the set of allowable values Each cell contains exactly one value (atomic) l A table that contains only single atomic values in each cell is said to be NORMALIZED, that is, it is in first normal form. More on this later…
Keys l l Attribute, or set of attributes, that uniquely identify a row There are multiple types of keys
Super Key l l l ANY combination of attributes that uniquely identify a row May contain attributes that are not necessary for unique identification Patient (SSN, MRN, Last Name, First Name, Address, DOB) l l l (SSN, MRN, Last Name) (SSN, Last Name, First Name) (SSN, Address, DOB) SSN MRN
Candidate Key l l l Uniquely identifies a row Can be single attribute or composite key (made of more than one attribute) Is irreducible (no redundant/unnecessary attributes) Possible primary key When identifying: look for single attribute candidates first, then reasonable and likely composite keys
Primary key (PK) l l The candidate key that has been chosen to identify unique rows in the table The “best” candidate key l l Definition of “best” may be up to your judgment Patient (SSN, MRN, Last Name, First Name, Address, DOB) l Do you choose SSN or MRN as the PK?
Surrogate Key l l Single attribute added to a table to act as a primary key. Vital. Sign. Record (MRN, VS_Date. Time, Vital. Sign, VS_value) Vitals. ID MRN VS_Date. Time Vital. Sign VS_Value 1 12345 9/8/02 8: 00 Temp_C 36 2 54321 9/8/02 8: 00 Temp_C 36. 2 3 12345 9/8/02 8: 00 Pulse 72 Add Vitals. ID as a surrogate key in place of the 3 attribute PK.
Candidate Keys Revisited l l l A table may have multiple candidate keys ONE “best” candidate is designated as the primary key. Sometimes the best candidate for primary key is a surrogate. Candidate keys that have not been chosen as the primary key are called “alternate keys”
Foreign Key l l Used to connect two tables Must have same domain as primary key (or alternate key) of the parent table l l Nurse (SSN, First_Name, Last_Name, Unit. Name) Unit (Unit. Name, Specialty, Phone_Number) Unit. Name is PK in the Unit table, and is FK in the Nurse table.
Key attributes l Key attributes are l Primary key l l Alternate key(s) if present l l The “best” candidate key Candidate keys not selected as PK Foreign key(s) if present l Match the PK (or a candidate key) in another table
Non-key attributes l l l NOT the primary key NOT a candidate (alternate) key NOT a foreign key
Format l l l The convention is to give the name of the table, followed by the attribute names in parentheses Primary Key (PK): underline, list first Foreign Key (FK): italics, sometimes (not always) listed last Table (PK, non-key…, fk) Note that alternate keys are not distinguished from non-key in this convention.
Relational Integrity l l Integrity is enforced by rules that ensure that the data are accurate and consistent Domain constraints are one way to be sure the data are accurate l l l Data type, field size (physical) Allowable values (conceptual) Enterprise constraints (business rules) may further define/restrict the allowable data
Relational Integrity l l Entity Integrity: No primary key (or part of the PK) can be null Referential Integrity: If a table has foreign key attributes, the value of the foreign key must match a candidate key in the FK’s “home” table, or the foreign key value must be entirely null.