INFO 340 Lecture 3 Relational Databases Relational Databases
INFO 340 Lecture 3 Relational Databases
Relational Databases • Based on the relational model, grounded in mathematic set theories. • Three basic elements: Relation, Tuple, and Attributes (in SQL called Tables, Rows, Columns)
Core Relation Model Attributes Relation Name Departme Phone nt DOB Sex John White Managem ent x 1209 1 -Oct-45 M Ann Beech Payroll x 4000 10 -Nov 60 F David Ford Finance X 3000 12 -Jul-72 M Tuple
Other Elements in the Relational Model • Domain – Specifies the constraints placed upon an attribute or attributes. • Degree – The number of attributes in a relation. • Cardinality – Number of tuples in a relation. And finally…. • Relational database – normalized collection of distinctly named relations.
Relational Keys Main goal is to ensure that each tuple can be uniquely identified. Assume there are 10, 000 records in this database. What would be a good way to uniquely identify each row?
Relational Keys How about now? Other ways to uniquely identify -Composite Key
Relational Keys • Superkey - Any attribute set that uniquely identifies • Candidate key – Uniquely identifies – Irreducibility - can’t make it any smaller & still work • Primary key - The candidate key that you choose to use • Foreign key - An attribute (or set) in a table that matches the candidate key of another table. (Can be the same table).
Relational Keys
Mathematical Definition of Relation • Consider two sets, D 1 & D 2, where D 1 = {2, 4} and D 2 = {1, 3, 5}. • Cartesian product, D 1 X D 2, is set of all ordered pairs, where first element is member of D 1 and second element is member of D 2. D 1 X D 2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)} • Alternative way is to find all combinations of elements with first from D 1 and second from D 2.
Mathematical Definition of Relation • Any subset of Cartesian product is a relation; e. g. • R = {(2, 1), (4, 1)} • May specify which pairs are in relation using some condition • for selection; e. g. –second element is 1: R = {(x, y) | x ∈ D 1, y ∈ D 2, and y = 1} –first element is always twice the second: S = {(x, y) | x ∈ D 1, y ∈ D 2, and x = 2 y}
Relational database model means relations between attributes A common misconception is that the “relational” in relational databases refers to the way one table relates to another table via foreign keys. This is not the case! Instead, it refers to the relation between attributes in a relation (table in DB speak).
Integrity Constraints • Domain constraints - every attribute has an associated domain • Integrity Rules • Entity Integrity • No attribute of primary key can be null • Referential Integrity • If a foreign key exists in a relation, it must ‘point’ to something -- ie there must be a candidate value in the home relation
Examples of Attribute Domains
NULL • Used to indicate value unknown. This means it doesn’t work in normal comparisons. • Null is not Zero or a blank string. • Debate rages over whether or not to include NULL. – Opens up a relational model. . No longer closed. • Was in E. F. Codd’s original specification
Null example • Consider the following: Name Hourly. Rate Bob 8. 50 John 12 Sue NULL Alice 8. 50 • What would the result of asking for rows where Hourly. Rate != 8. 50 return? • What about Hourly. Rate = 8. 50 or Hourly. Rate != 8. 50?
Donald Rumsfeld, Feb 2002 news briefing • “As we know, there are knowns. There are things we know. We also know there are known unknowns. That is to say we know there are some things we do not know. But there also unknowns, the ones we don’t know. ”
Examples Cyber-Crime Attribution Database Law enforcement work is being done to identify relationships & patterns of activity with global malware-writers. A database needs to be built that tracks known malware writers, their locations, the nearest educational facility/institute, known aliases, known affiliations, etc. It also must track known malware & its types, ie trojan, worm, rootkit, its size etc. It must track which computers have been attacked, what types they are, what their OS’s are, where they were located at the time of attack, the time of attack itself, the IP addresses and domain names of the boxes. Were the victims firewalled? What type of firewall – built in, external hardware, etc. What was the patch level at the time of attack? Were updates running? How many search engine references are there to this code name? Break this down by specific search engine.
Class Activity • Divide into 5 groups (approx 6 - 7 people) • Give yourself a group name • Take 15 minutes to come up with scenario requiring a database. Write it down legibly. – Name 3 canned queries that the database must be able to solve. Write these down as well. • At the end of 15 minutes, the first group will give their scenario to the next group. The receiving group will: – On the board, create an ERD with: • field names & data types • An identified primary key
Homework 1 Due Tue 1/14/08 • Page 87 -- Questions: – 3. 3, 3. 4, 3. 5, 3. 8 • Page 110 – Questions: – 4. 1, 4. 8, 4. 9
- Slides: 19