CSE 53307330 Database Introduction Fall 2009 Margaret H
CSE 5330/7330 Database Introduction Fall 2009 Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University POBox 750122 Dallas, Texas 75275 -0122 CSE 5330/7330 Fall 2009 1
Database Introduction NOTE: These slides provide an overview of the basic database concepts. During the semester we will return to them to provide an overview and summary of each section covered. CSE 5330/7330 Fall 2009 2
Database Outline n Introduction n File Organization & Indexing n Data Models n Relational Model n SQL/Query Processing n Transactions CSE 5330/7330 Fall 2009 3
Database History A Short Database History by John Vaughn http: //math. hws. edu/vaughn/cpsc/343/2003/hist ory. html n A Brief History of Database Systems http: //www. comphist. org/computing_history/new _page_9. htm n CSE 5330/7330 Fall 2009 4
DB History Snapshots n n n n n CFMS/DBOMP (Late 60 s) EF Codd Paper (1970) DBTG Report (1974) IMS/IDMS (Early 1970 s) System R (1970 s) Transactions – Jim Gray (1970 s) ER Model (1976) OODB (1985) XML/Internet (1990 s) CSE 5330/7330 Fall 2009 5
What is a Database? n Collection of Related Data – Hardware – Software (DBMS) – Users CSE 5330/7330 Fall 2009 6
File vs. Database n n n n Single user vs. multiple users Simple relationships vs. complex relationships Integrity support Concurrency control Recovery Query language Security Different Views of data CSE 5330/7330 Fall 2009 7
Some DB Terms n n n Data/Information/Knowledge Data. Base Management System (DBMS) Data Dictionary/Directory/Metadata Data Model Data Definition Languare (DDL) Data Manipulaiton Language (DML) Data. Base Administrator (DBA) Data Administrator (DA) Database designer Information Resource Manager (IRM) Chief Information Officer (CIO) … CSE 5330/7330 Fall 2009 8
DBMS Components n n n n n DDL Compiler DML Compiler Precompiler (embedded language support) Access methods Concurrency Control Recovery Security Data Dictionary (Metadata) Utility Services … CSE 5330/7330 Fall 2009 9
Views of Data n Levels of Abstraction – External view – Conceptual schema – Physical (Internal schema) n Data independence CSE 5330/7330 Fall 2009 10
Data Model n Way to “picture” and access data independent of how it is actually stored. – – n Data Description Data Relationships Operations Integrity/Consistency constraints Examples: – – – Entity-Relationship (ER/ERA) Relational Object Oriented Object/Relational Older – Network/Hierarchical CSE 5330/7330 Fall 2009 11
Relational Model* n n Based on tables, as: acct # Name 12345 Sally 34567 Sue … … Rows (tuples) Columns (attributes) Today used in most DBMS's. Balance 1000. 21 285. 48 … Most of the following slides were obtained from the home page for A First Course in Database Systems by Jeffrey D. Ullman and Jennifer Widom, Prentice Hall, 2002, http: //www-db. stanford. edu/~ullman/fcdb. html CSE 5330/7330 Fall 2009 12
Data Relationships n One-to-one (1: 1) Ex: Name to SSN n One-to-many (1: M) Ex: Name to Phone n Many-to many (M: n) Ex: Part to Supplier n What data structures can be used to store these relationships? CSE 5330/7330 Fall 2009 13
Database Outline n Introduction n File Organization & Indexing Data Models n Relational Model n SQL/Query Processing n Transactions n CSE 5330/7330 Fall 2009 14
Typical Data Structures used in DBMSs Sequential Files n Hash Table n – Extendible Hash Table n B-Tree (Multiway Search Tree) – B+-Tree Combinations of these n Indices n CSE 5330/7330 Fall 2009 15
Placement of Data on Disk n n n Record/Cylinder/Block/Sector Blocking Factor Allocation – – n n n Contiguous Linked Extents Indexed Clustering Partitioning RAID CSE 5330/7330 Fall 2009 16
Data Structure Pointers n Logical – Key – Relative Block n Physical – Memory - Physical Address (offset) – Disk – Physical Address » Device/Cylinder/Track/Sector/Block/Offset CSE 5330/7330 Fall 2009 17
Disk vs. Memory Data Structures n Objective – Disk – minimize I/O – Memory – minimize memory accesses or CPU time n Tree – Disk – large nodes, shallow – Memory – small nodes. Deep CSE 5330/7330 Fall 2009 18
Access n n n Sequential – Retrieve records in order (logical/physical) Random – Retrieve record based on key Direct – Retrieve record based on physical address Relative – Retrieve record based on relative position in file Binary Search – Randomly retrieve record doing binary search CSE 5330/7330 Fall 2009 19
Organization n Sequential – Records stored in logical order of key. Access: Sequential, relative, binary search n Heap – Records added to end or where space. Access: Direct n Btree – Multiway balanced search tree. Access: Sequential, random n Hash – Store and access record based on address determined when key is hashed. Access: Random CSE 5330/7330 Fall 2009 20
Indexing Speed up processing of data by providing alternative access path. n Both index and primary storage of data provide access method. n Ex: n BTree index on last name Employee Data Hash on ID BTree index on job type CSE 5330/7330 Fall 2009 21
Index Types n Number of entries – Dense – One index entry for each record in file – Sparse – One index entry for many records n Key – Primary – Same key as main file – Secondary – Different key from original file n Organizations: Hash, BTree, Sequential, BST CSE 5330/7330 Fall 2009 22
Index Search Times Organization Worst Expected Sequential O(n) O(n/2) Hash O(n) O(1. ? ? ) Tree (Balanced) B+-Tree O(n) O(lg n) CSE 5330/7330 Fall 2009 23
Hashing Bucket in one block (or cluster thereof) n Hash value may be precise address or relative block (bucket) number n Collisions handled by linked lists n Dynamic Hashing – Allows hash table size to grow n CSE 5330/7330 Fall 2009 24
Multiple Key Indexing Key composed of many subkeys n Access based on all or subset of these n Some indexing structures specifically targeted to n-dimensional accessing n CSE 5330/7330 Fall 2009 25
Database Outline Introduction n File Organization & Indexing n n Data Models Relational Model n SQL/Query Processing n Transactions n CSE 5330/7330 Fall 2009 26
Data Model Evolution 60’s Hierarchical Network 70's 80's Choice for most new applications Relational 90’s Object Bases Knowledge Bases now CSE 5330/7330 Fall 2009 27
Entity/Relationship Model Diagrams to represent designs. n Entity like object, = “thing. ” n Entity set like class = set of “similar” entities/objects. n Attribute = property of entities in an entity set. n In diagrams: – entity set rectangle – attribute oval. ID name Students CSE 5330/7330 Fall 2009 phone height 28
Relationships Connect two or more entity sets. n Represented by diamonds. n Students Taking CSE 5330/7330 Fall 2009 Courses 29
Relationship Set Think of the “value” of a relationship set as a table. n One column for each of the connected entity sets. n One row for each list of entities, one from each set, that are connected by the relationship. Students Sally Joe … Courses CS 180 CS 111 CS 180 … CSE 5330/7330 Fall 2009 30
Courses Students Enrolls TAs Students Ann Sue Bob … Courses CS 180 … CSE 5330/7330 Fall 2009 TAs Jan Pat Jan … 31
Beers-Bars-Drinkers Example name addr license Serves Bars Frequents Beers Likes Drinkers name manf name CSE 5330/7330 Fall 2009 addr 32
Multiplicity of Relationships Many-many Many-one One-one Representation of Many-One n E/R: arrow pointing to “one. ” – Rounded arrow = “exactly one. ” CSE 5330/7330 Fall 2009 33
Example: Drinkers Have Favorite Beers name Serves addr license Bars Frequents Likes Beers name manf Drinkers Favorite CSE 5330/7330 Fall 2009 name addr 34
One-One Relationships Put arrows in both directions. Manfs Bestseller Beers Design Issue: Is the rounded arrow justified? Design Issue: Here, manufacturer is an E. S. In earlier diagrams it is an attribute. Which is right? CSE 5330/7330 Fall 2009 35
Attributes onprice. Relationships Bars n Sells Beers Shorthand for 3 -way relationship: price Prices Bars Sells CSE 5330/7330 Fall 2009 Beers 36
Roles Sometimes an E. S. participates more than once in a relationship. n Label edges with roles to distinguish. Husband Wife Married d 1 d 2 d 3 d 4 husband wife … … Drinkers CSE 5330/7330 Fall 2009 37
Buddies 1 2 Drinkers n Buddy 1 d 1 d 2 … Buddy 2 d 3 d 1 d 4 … Notice Buddies is symmetric, Married not. – No way to say “symmetric” in E/R. Design Question Should we replace husband wife by one relationship spouse? CSE 5330/7330 Fall 2009 38
Multiple Inheritance Theoretically, an E. S. could be a subclass of several other entity sets. name manf Beers Wines isa Grape Beers CSE 5330/7330 Fall 2009 39
Keys A key is a set of attributes whose values can belong to at most one entity. n In E/R model, every E. S. must have a key. – It could have more than one key, but one set of attributes is the “designated” key. n In E/R diagrams, you should underline all attributes of the designated key. CSE 5330/7330 Fall 2009 40
Example n Suppose name is key for Beers. name Beers manf isa color n Ales Beer name is also key for ales. – In general, key at root is key for all. CSE 5330/7330 Fall 2009 41
Example: A Multiattribute Key number dept n hours Courses room Possibly, the combination of hours + room also forms a key, but we have not designated it as such. CSE 5330/7330 Fall 2009 42
Database Outline Introduction n File Organization & Indexing n Data Models n n Relational Model SQL/Query Processing n Transactions n CSE 5330/7330 Fall 2009 43
Relational Model n n n Table = relation. Column headers = attributes. Row = tuple name Winter. Brew Bud. Lite … Beers n manf Pete’s A. B. … Relation schema = name(attributes) + other structure info. , e. g. , keys, other constraints. Example: Beers(name, manf) – Order of attributes is arbitrary, but in practice we need to assume the order given in the relation schema. n n Relation instance is current set of rows for a relation schema. Database schema = collection of relation schemas. CSE 5330/7330 Fall 2009 44
Relation Instance Name Bob Address 123 Main St Telephone 555 -1234 Bob Pat 128 Main St 123 Main St 555 -1235 Harry Sally 456 Main St 555 -2221 Sally Pat 456 Main St 12 State St 555 -2223 555 -1235 CSE 5330/7330 Fall 2009 45
Why Relations? Very simple model. n Often a good match for the way we think about our data. n Abstract model that underlies SQL, the most important language in DBMS’s today. n CSE 5330/7330 Fall 2009 46
Relational Design Simplest approach (not always best): convert each E. S. to a relation and each relationship to a relation. Entity Set Relation E. S. attributes become relational attributes. name manf Beers Becomes: Beers(name, manf) CSE 5330/7330 Fall 2009 47
Keys in Relations An attribute or set of attributes K is a key for a relation R if we expect that in no instance of R will two different tuples agree on all the attributes of K. n Indicate a key by underlining the key attributes. n Example: If name is a key for Beers: Beers(name, manf) CSE 5330/7330 Fall 2009 48
E/R Relationships Relations Relation has attribute for key attributes of each E. S. that participates in the relationship. n Add any attributes that belong to the relationship itself. n Renaming attributes OK. – Essential if multiple roles for an E. S. CSE 5330/7330 Fall 2009 49
name addr Drinkers 1 Likes manf Beers 2 Buddies husband Favorite wife Married n name Likes(drinker, beer) Favorite(drinker, beer) Married(husband, wife) Buddies(name 1, name 2) For one-one relation Married, we can choose either husband or wife as key. CSE 5330/7330 Fall 2009 50
Combining Relations Sometimes it makes sense to combine relations. n Common case: Relation for an E. S. E plus the relation for some many-one relationship from E to another E. S. Example Combine Drinker(name, addr) with Favorite(drinker, beer) to get Drinker 1(name, addr, fav. Beer). n Danger in pushing this idea too far: redundancy. n e. g. , combining Drinker with Likes causes the drinker's address to be repeated, viz. : name Sally n addr 123 Maple beer Bud Miller Notice the difference: Favorite is many-one; Likes is many-many. CSE 5330/7330 Fall 2009 51
Keys of Relations K is a key for relation R if: 1. K all attributes of R. (Uniqueness) 2. For no proper subset of K is (1) true. (Minimality) n If K at least satisfies (1), then K is a superkey. Conventions n Pick one key; underline key attributes in the relation schema. CSE 5330/7330 Fall 2009 52
Example Drinkers(name, addr, beers. Liked, manf, favorite. Beer) n {name, beers. Liked} FD’s all attributes, as seen. – Shows {name, beers. Liked} is a superkey. n n name beers. Liked is false, so name not a superkey. beers. Liked name also false, so beers. Liked not a superkey. n Thus, {name, beers. Liked} is a key. n No other keys in this example. – Neither name nor beers. Liked is on the right of any observed FD, so they must be part of any superkey. CSE 5330/7330 Fall 2009 53
Example 2 Lastname Firstname Key Student ID Major Key (2 attributes) Superkey Note: There alternate keys n Keys are {Lastname, Firstname} and {Student. ID} CSE 5330/7330 Fall 2009 54
Normalization n Process of simplifying relational design: – Avoid redundancy n Functional Dependencies (FD) – Identify relationships between data values – SSN Name » In any tuple, the value for SSN determines a unique value for Name. » If the same SSN exists in two tuples, you’ll have the same Name duplicated. n FDs are used by algorithms to determine best relations to be used given a set of attributes. CSE 5330/7330 Fall 2009 55
Example of Problems Drinkers(name, addr, beers. Liked, manf, favorite. Beer) FD’s: 1. name addr 2. name favorite. Beer 3. beers. Liked manf n ? ? ? ’s are redundant, since we can figure them out from the FD’s. n Update anomalies: If Janeway gets transferred to the Intrepid, will we change addr in each of her tuples? n Deletion anomalies: If nobody likes Bud, we lose track of Bud’s manufacturer. CSE 5330/7330 Fall 2009 56
Database Outline Introduction n File Organization & Indexing n Data Models n Relational Model n n SQL/Query n Processing Transactions CSE 5330/7330 Fall 2009 57
“Core” Relational Algebra A small set of operators that allow us to manipulate relations in limited but useful ways. The operators are: 1. Union, intersection, and difference: the usual set operators. – But the relation schemas must be the same. 2. Selection: Picking certain rows from a relation. 3. Projection: Picking certain columns. 4. Products and joins: Composing relations in useful ways. 5. Renaming of relations and their attributes. CSE 5330/7330 Fall 2009 58
Selection R 1 = C (R 2 ) where C is a condition involving the attributes of relation R 2. Example Relation Sells: Joe. Menu = bar=Joe's(Sells) CSE 5330/7330 Fall 2009 59
Projection R 1 = L (R 2 ) where L is a list of attributes from the schema of R 2. Example beer, price(Sells) n Notice elimination of duplicate tuples. CSE 5330/7330 Fall 2009 60
Product R = R 1 R 2 pairs each tuple t 1 of R 1 with each tuple t 2 of R 2 and puts in R a tuple t 1 t 2. A B C D D E F A B C D D' E F CSE 5330/7330 Fall 2009 61
Join Sells Bar. Info = Sells. Bar=Bars. Name Bars CSE 5330/7330 Fall 2009 62
SQL SEQUEL in System R n Structured English QUEry Language n DDL and DML n Standard Relational query language n CSE 5330/7330 Fall 2009 63
SQL Operations SELECT … FROM … WHERE … n UPDATE … SET … WHERE … n INSERT INTO … VALUES (…) n DELETE … WHERE … n CSE 5330/7330 Fall 2009 64
SQL Employee Name Department Dept Manager SQL SELECT Manager FROM Employee, Department WHERE Employee. name = "Clark Kent” AND Employee. Dept = Department. Dept CSE 5330/7330 Fall 2009 65
Host Languages C, C++, Fortran, Lisp, COBOL Application prog. Calls to DB DBMS Local Vars (Memory) n n n Different DBMSs support different host language interfaces Precompiler ODBC/JDBC CSE 5330/7330 Fall 2009 (Storage) 66
Embedded SQL Add to a conventional programming language (C in our examples) certain statements that represent SQL operations. n Each embedded SQL statement introduced with EXEC SQL. n Preprocessor converts C + SQL to pure C. – SQL statements become procedure calls. CSE 5330/7330 Fall 2009 67
Example Find the price for a given beer at a given bar. Sells(bar, beer, price) EXEC SQL BEGIN DECLARE SECTION; char the. Bar[21], the. Beer[21]; float the. Price; EXEC SQL END DECLARE SECTION; . . . /* assign to the. Bar and the. Beer */. . . EXEC SQL SELECT price INTO : the. Price FROM Sells WHERE beer = : the. Beer AND bar = : the. Bar; . . . CSE 5330/7330 Fall 2009 68
Call-Level Interfaces A more modern approach to the hostlanguage/SQL connection is a call-level interface, in which the C (or other language) program creates SQL statements as character strings and passes them to functions that are part of a library. n Similar to what really happens in embedded SQL implementations. n Two major approaches: SQL/CLI (standard of ODBC = open database connectivity) and JDBC (Java database connectivity). CSE 5330/7330 Fall 2009 69
n n n JDBC Start with a Connection object, obtained from the DBMS (see text). Method create. Statement() returns an object of class Statement (if there is no argument) or Prepared. Statement if there is an SQL statement as argument. Example Statement stat 1 = my. Con. create. Statement(); Prepared. Statement stat 2 = my. Con. create. Statement( "SELECT beer, price " + "FROM Sells" + "WHERE bar = 'Joe''s Bar'" ); n my. Con is a connection, stat 1 is an “empty” statement object, and stat 2 is a (prepared) statement object that has an SQL statement associated. CSE 5330/7330 Fall 2009 70
Executing Statements n n n JDBC distinguishes queries from updates Methods execute. Query() and execute. Update() are used to execute these two kinds of SQL statements. When a query is executed, it returns an object of class Result. Set. Example stat 1. execute. Update( "INSERT INTO Sells" + "VALUES('Brass Rail', 'Bud', 3. 00)" ); Result. Set Menu = stat 2. execute. Query(); CSE 5330/7330 Fall 2009 71
Getting the Tuples of a Result. Set n Method Next() applies to a Result. Set and moves a “cursor” to the next tuple in that set. – Apply Next() once to get to the first tuple. – Next() returns FALSE if there are no more tuples. n While a given tuple is the current of the cursor, you can get its ith component by applying to a Result. Set a method of the form get X(i), where X is the name for the type of that component. Example while(Menu. Next()) { the. Beer = Menu. get. String(1); the. Price = Menu. get. Float(2); . . . } CSE 5330/7330 Fall 2009 72
Database Outline Introduction n File Organization & Indexing n Data Models n Relational Model n SQL/Query Processing n n Transactions CSE 5330/7330 Fall 2009 73
Transactions = units of work that must be: 1. Atomic = either all work is done, or none of it. 2. Consistent = relationships among values maintained. 3. Isolated = appear to have been executed when no other DB operations were being performed. – Often called serializable behavior. 4. Durable = effects are permanent even if system crashes. CSE 5330/7330 Fall 2009 74
Commit/Abort Decision Each transaction ends with either: 1. Commit = the work of the transaction is installed in the database; previously its changes may be invisible to other transactions. 2. Abort = no changes by the transaction appear in the database; it is as if the transaction never occurred. – ROLLBACK is the term used in SQL and the Oracle system CSE 5330/7330 Fall 2009 75
Example n n Sells(bar, beer, price) Joe's Bar sells Bud for $2. 50 and Miller for $3. 00. Sally is querying the database for the highest and lowest price Joe charges: (1) SELECT MAX(price) FROM Sells WHERE bar = 'Joe''s Bar'; (2) SELECT MIN(price) FROM Sells n WHERE bar = 'Joe''s Bar'; At the same time, Joe has decided to replace Miller and Bud by Heineken at $3. 50: (3) DELETE FROM Sells WHERE bar = 'Joe''s Bar' AND (beer = 'Miller' OR beer = 'Bud'); (4) INSERT INTO Sells VALUES('Joe''s bar', 'Heineken', 3. 50); CSE 5330/7330 Fall 2009 76
Example: Problem With Rollback n n n Suppose Joe executes statement 4 (insert Heineken), but then, during the transaction thinks better of it and issues a ROLLBACK statement. If Sally is allowed to execute her statement 1 (find max) just before the rollback, she gets the answer $3. 50, even though Joe doesn't sell any beer for $3. 50. Fix by making statement 4 a transaction, or part of a transaction, so its effects cannot be seen by Sally unless there is a COMMIT action. CSE 5330/7330 Fall 2009 77
Deadlock AND 1. Wait and hold some locks while you wait for others 2. Circular chain of waiters wait-for graph T 4 T 1 T 3 T 2 3. No pre-emption We can avoid deadlock by doing at least ONE of: 1. Get all your locks at once 2. Apply an ordering to acquiring locks 3. Allow preemption (for example, use timeout on waits) CSE 5330/7330 Fall 2009 78
Serializability of schedules T 1 T 2 Read (A) A: = A-50 temp: = A * 0. 1 Write (A) A: = A + temp Read (B) Write (A) B: = B+50 Read (B) Write (B) B: = B - temp Write (B) A disk 100 B 200 T 1 T 2 Schedule is serializable if effect is the same as a serial schedule A T 1 –> T 2 A= B= T 2 –> T 1 A= B= CSE 5330/7330 Fall 2009 T 1 T 2 B 79
T 1 T 2 T 3 T 4 T 5 T 6 C T 2 D T 3 A T 4 B A D T 5 C T 6 If no progress is possible, then there is a cycle CSE 5330/7330 Fall 2009 80
Cascading Abort T 1 T 2 LOCK A Read A change A Write A UNLOCK A LOCK B Read B Discover problem ABORT CSE 5330/7330 Fall 2009 81
Two-Phase Locking (2 PL) n Phase I: All requesting of locks precedes II: Any releasing of locks n Theorem: Any schedule for 2 -phase locked transaction is serializable Locks Time CSE 5330/7330 Fall 2009 82
- Slides: 82