CS 405 G Introduction to Database Systems Instructor
CS 405 G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009
Review l Database l l Relation schemas, relation instances and relational constraints. What’s next? l 10/20/2021 Relational query language. Jinze Liu @ University of Kentucky 2
Relational Query Languages l l Query languages: Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: l l l Strong formal foundation based on logic. Allows for much optimization. Query Languages != programming languages! l l QLs not intended to be used for complex calculations and inference (e. g. logical reasoning) QLs support easy, efficient access to large data sets. 10/20/2021 Jinze Liu @ University of Kentucky 3
Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e. g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-procedural, declarative. ) * Understanding Algebra & Calculus is key to understanding SQL, query processing! 10/20/2021 Jinze Liu @ University of Kentucky 4
Relational algebra A language for querying relational databases based on operators: Rel. Op l Core set of operators: l l Additional, derived operators: l l Selection, projection, cross product, union, difference, and renaming Join, natural join, intersection, etc. Compose operators to make complex queries 10/20/2021 Jinze Liu @ University of Kentucky 5
Selection l l Input: a table R Notation: p R l l l p is called a selection condition/predicate Purpose: filter rows according to some criteria Output: same columns as R, but only rows of R that satisfy p 10/20/2021 Jinze Liu @ University of Kentucky 6
Selection example Students with GPA higher than 3. 0 GPA > 3. 0 Student l sid name age gpa 1234 John Smith 21 3. 5 1123 Mary Carter 22 3. 8 1011 Bob Lee 22 2. 6 1204 Susan Wong 22 3. 4 1306 Kevin Kim 21 2. 9 10/20/2021 GPA > 3. 0 Jinze Liu @ University of Kentucky 7
More on selection l Selection predicate in general can include any column of R, constants, comparisons (=, ·, etc. ), and Boolean connectives ( : and, : or, and : : not) l l Example: straight A students under 18 or over 21 GPA ¸ 4. 0 Æ (age < 18 Ç age > 21) Student But you must be able to evaluate the predicate over a single row of the input table l Example: student with the highest GPA ¸ all GPA in Student table Student 10/20/2021 Jinze Liu @ University of Kentucky 8
Projection l l Input: a table R Notation: πL R l l l L is a list of columns in R Purpose: select columns to output Output: same rows, but only the columns in L l l Order of the rows is preserved Number of rows may be less (depends on where we have duplicates or not) 10/20/2021 Jinze Liu @ University of Kentucky 9
Projection example l ID’s and names of all students πSID, name Student sid name age gpa sid name 1234 John Smith 21 3. 5 1234 John Smith 1123 Mary Carter 22 3. 8 1123 Mary Carter 1011 Bob Lee 22 2. 6 1011 Bob Lee 1204 Susan Wong 22 3. 4 1204 Susan Wong 1306 Kevin Kim 21 2. 9 1306 Kevin Kim 10/20/2021 πSID, name Jinze Liu @ University of Kentucky 10
More on projection l Duplicate output rows are removed (by definition) l Example: student ages π age Student sid name age gpa age 1234 John Smith 21 3. 5 21 1123 Mary Carter 22 3. 8 22 1011 Bob Lee 22 2. 6 1204 Susan Wong 22 3. 4 1306 Kevin Kim 21 2. 9 10/20/2021 π age Jinze Liu @ University of Kentucky 22 22 21 11
Cross product l l Input: two tables R and S Notation: R X S Purpose: pairs rows from two tables Output: for each row r in R and each row s in S, output a row rs (concatenation of r and s) 10/20/2021 Jinze Liu @ University of Kentucky 12
Cross product example l Student X Enroll sid name age gpa sid cid grade 1234 John Smith 21 3. 5 1234 647 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 10/20/2021 × sid name age gpa sid cid grade 1234 John Smith 21 3. 5 1234 647 A 1123 Mary Carter 22 3. 8 1234 647 A 1011 Bob Lee 22 2. 6 1234 647 A 1234 John Smith 21 3. 5 1123 108 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 1123 108 A Jinze Liu @ University of Kentucky 13
A note on column ordering l l The ordering of columns in a table is considered unimportant (as is the ordering of rows) sid name age gpa sid name gpa age 1234 John Smith 21 3. 5 1234 John Smith 3. 5 21 1123 Mary Carter 22 3. 8 1123 Mary Carter 3. 8 22 1011 Bob Lee 22 2. 6 1011 Bob Lee 2. 6 22 = That means cross product is commutative, i. e. , R X S = S X R for any R and S 10/20/2021 Jinze Liu @ University of Kentucky 14
Derived operator: join l l Input: two tables R and S Notation: R p S l l p is called a join condition/predicate Purpose: relate rows from two tables according to some criteria Output: for each row r in R and each row s in S, output a row rs if r and s satisfy p Shorthand for σp ( R X S ) 10/20/2021 Jinze Liu @ University of Kentucky 15
Join example l Info about students, plus CID’s of their courses Student. SID = Enroll. SID Enroll sid name age gpa sid cid grade 1234 John Smith 21 3. 5 1234 647 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 Student. SID = Enroll. SID Use table_name. column_name syntax to disambiguate sid name identically named 1234 John Smith columns from 1123 Mary Carter different input 1011 Bob Lee tables 1234 John Smith 10/20/2021 age gpa sid cid grade 21 3. 5 1234 647 A 22 3. 8 1234 647 A 22 2. 6 1234 647 A 21 3. 5 1123 108 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 1123 108 A Jinze Liu @ University of Kentucky 16
Derived operator: natural join l l l Input: two tables R and S Notation: R S Purpose: relate rows from two tables, and l l l Enforce equality on all common attributes Eliminate one copy of common attributes Shorthand for πL ( R l l p S ), where p equates all attributes common to R and S L is the union of all attributes from R and S, with duplicate attributes removed 10/20/2021 Jinze Liu @ University of Kentucky 17
Natural join example l Student Enroll = πL ( Student p Enroll ) = πSID, name, age, GPA, CID ( Student !Student. SID = Enroll. SID Enroll ) sid name age gpa sid cid grade 1234 John Smith 21 3. 5 1234 647 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 10/20/2021 sid name age gpa sid cid grade 1234 John Smith 21 3. 5 1234 647 A 1123 Mary Carter 22 3. 8 1234 647 A 1011 Bob Lee 22 2. 6 1234 647 A 1234 John Smith 21 3. 5 1123 108 A 1123 Mary Carter 22 3. 8 1123 108 A 1011 Bob Lee 22 2. 6 1123 108 A Jinze Liu @ University of Kentucky 18
Union l l Input: two tables R and S Notation: R S l l R and S must have identical schema Output: l l Has the same schema as R and S Contains all rows in R and all rows in S, with duplicate rows eliminated 10/20/2021 Jinze Liu @ University of Kentucky 19
Difference l l Input: two tables R and S Notation: R - S l l R and S must have identical schema Output: l l Has the same schema as R and S Contains all rows in R that are not found in S 10/20/2021 Jinze Liu @ University of Kentucky 20
Derived operator: intersection l l Input: two tables R and S Notation: R S l l Output: l l l R and S must have identical schema Has the same schema as R and S Contains all rows that are in both R and S Shorthand for R - ( R - S ) Also equivalent to S - ( S - R ) And to R S 10/20/2021 Jinze Liu @ University of Kentucky 21
Renaming l l l Input: a table R Notation: ρS R, ρ(A 1, A 2, …) R or ρS(A 1, A 2, …) R Purpose: rename a table and/or its columns Output: a renamed table with the same rows as R Used to l l Avoid confusion caused by identical column names Create identical columns names for natural joins 10/20/2021 Jinze Liu @ University of Kentucky 22
Renaming Example l Enroll 1(SID 1, CID 1, Grade 1) Enroll sid cid grade 1234 647 A 1123 108 A 10/20/2021 Enroll 1(SID 1, CID 1, Grade 1) Jinze Liu @ University of Kentucky sid 1 cid 1 grade 1 1234 647 A 1123 108 A 23
Review: Summary of core operators l l l Selection: Projection: Cross product: Union: Difference: Renaming: l Does not really add “processing” power 10/20/2021 σp R πL R RXS R S R-S ρ S(A 1, A 2, …) R Jinze Liu @ University of Kentucky 24
Review Summary of derived operators l Join: Natural join: Intersection: l Many more l l R p. S R S Outer join, Division, Semijoin, anti-semijoin, … 10/20/2021 Jinze Liu @ University of Kentucky 25
Using Join l Which classes is Lisa taking? l l l Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character) l An Answer: l Student_Lisa σname = “Lisa”Student l Lisa_Enrolled Student_Lisa Enrolled l Lisa’s classes πCID Lisa_Enrolled l Or: l l l Student_Enrolled Student Enrolled Lisa_Enrolled σname = “Lisa” Student_Enrolled Lisa’s classes πCID Lisa_Enrolled 10/20/2021 Jinze Liu @ University of Kentucky 26
Join Example sid name age gpa 1234 John 21 3. 5 1123 Mary 22 3. 8 1012 Lisa 22 2. 6 sid cid grade 1123 108 A 1012 647 A 1012 108 B cid 647 108 10/20/2021 σname = “Lisa” πcid sid name age gpa 1012 Lisa 22 2. 6 sid name age gpa cid grade 1012 Lisa 22 2. 6 647 A 1012 Lisa 22 2. 6 108 B Jinze Liu @ University of Kentucky 27
Lisa’s Class l π CID( (σname = “Lisa”Student) Enrolled) Lisa’s classes πCID Who’s Lisa? σname = “Lisa” Enroll Student 10/20/2021 Jinze Liu @ University of Kentucky 28
Students in Lisa’s Classes l SID of Students in Lisa’s classes l Student_Lisa σname = “Lisa”Student l Lisa_Enrolled Student_Lisa Enrolled l Lisa’s classes πCID Lisa_Enrolled l Enrollment in Lisa’s classes Enrolled l Students in Lisa’s class πSID Enrollment in Lisa’s classes Students in Lisa’s classes πSID Lisa’s classes πCID Who’s Lisa? σname = “Lisa” 10/20/2021 Student Enroll Jinze Liu @ University of Kentucky 29
Tips in Relational Algebra l l Use temporary variables Use foreign keys to join tables 10/20/2021 Jinze Liu @ University of Kentucky 30
An exercise l Names of students in Lisa’s classes Their names Students in Lisa’s classes πname πSID Student Lisa’s classes πCID Who’s Lisa? σname = “Lisa” 10/20/2021 Enroll Student Enroll Jinze Liu @ University of Kentucky 31
Set Minus Operation l CID’s of the courses that Lisa is NOT taking All CID’s - CID’s of the courses that Lisa IS taking πCID Course Enroll 10/20/2021 Jinze Liu @ University of Kentucky σname = “Lisa” Student 32
Renaming Operation l Enrolled 1(SID 1, CID 1, Grade 1) Enrolled sid cid grade 1234 647 A 1123 108 A 10/20/2021 Enroll 1(SID 1, CID 1, Grade 1) Jinze Liu @ University of Kentucky sid 1 cid 1 grade 1 1234 647 A 1123 108 A 33
Example l We have the following relational schemas l l Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character) SID’s of students who take at least two courses Enrolled πSID (Enrolled. SID = Enrolled. SID & Enrolled. CID ¹ Enrolled. CID Enrolled) 10/20/2021 Jinze Liu @ University of Kentucky 34
Example (cont. ) Enroll 1(SID 1, CID 1, Grade 1) Enrolled Enroll 2(SID 2, CID 2, Grade 2) Enrolled πSID (Enroll 1 SID 1 = SID 2 & CID 1 ¹ CID 2 Enroll 2) π SID 1 Expression tree syntax: SID 1 = SID 2 & CID 1 ¹ CID 2 ρEnroll 1(SID 1, CID 1, Grade 1) ρEnroll 2(SID 2, CID 2, Grade 2) Enroll 10/20/2021 Enroll Jinze Liu @ University of Kentucky 35
A trickier exercise l Who has the highest GPA? l l Who has a GPA? Who does NOT have the highest GPA? l Whose GPA is lower than somebody else’s? - πSID Student πStudent 1. SID Student 1. GPA < Student 2. GPA ρStudent 1 A deeper question: When (and why) is “-” needed? 10/20/2021 Student Jinze Liu @ University of Kentucky ρStudent 2 Student 36
Tips in Relational Algebra l A comparison is to identify a relationship 10/20/2021 Jinze Liu @ University of Kentucky 37
Review: Summary of core operators l l l Selection: Projection: Cross product: Union: Difference: Renaming: l Does not really add “processing” power 10/20/2021 σp R πL R RXS R S R-S ρ S(A 1, A 2, …) R Jinze Liu @ University of Kentucky
Review: Summary of derived operators l l l Join: Natural join: Intersection: 10/20/2021 R p. S R S Jinze Liu @ University of Kentucky
Review l Relational algebra l l l Use temporary variable Use foreign key to join relations A comparison is to identify a relationship 10/20/2021 Jinze Liu @ University of Kentucky
Exercises of R. A. Reserves Boats Sailors Jinze Liu @ University of Kentucky
Problem 1 Find names of sailors who’ve reserved boat #103 l Solution: Who reserved boat #103? Boat #103 σbid πsname Sailors = “ 103” Reserves Jinze Liu @ University of Kentucky
Problem 2: Find names of sailors who’ve reserved a red boat l Information about boat color only available in Boats; so need an extra join: Names of sailors who reserved red boat πsname Who reserved red boats? Red boats σcolor πSID Sailors = “red” Reserve Boat Jinze Liu @ University of Kentucky
Problem 3: Find names of sailors who’ve reserved a red boat or a green boat l Can identify all red or green boats, then find sailors who’ve reserved one of these boats: Names of sailors who reserved red boat πsname Who reserved red boats? Red boats σcolor = “red” color = “green” πSID Sailors Reserve Boat Jinze Liu @ University of Kentucky
Problem 4: Find names of sailors who’ve reserved only one boat Jinze Liu @ University of Kentucky
Monotone operators Rel. Op Add more rows to the input. . . l If some old output rows may need to be removed l l Then the operator is non-monotone Otherwise the operator is monotone l l What happens to the output? That is, old output rows always remain “correct” when more rows are added to the input Formally, for a monotone operator op: R µ R’ implies op( R ) µ op( R’ ) 10/20/2021 Jinze Liu @ University of Kentucky 46
Classification of relational operators l l l l Selection: σp R Projection: πL R Cross product: R X S Join: R p S Natural join: R S Union: R U S Difference: R - S Intersection: R ∩ S 10/20/2021 Monotone Monotone w. r. t. R; non-monotone w. r. t S Monotone Jinze Liu @ University of Kentucky 47
Why is “-” needed for highest GPA? Composition of monotone operators produces a monotone query l l Old output rows remain “correct” when more rows are added to the input Highest-GPA query is non-monotone l l F Current highest GPA is 4. 1 Add another GPA 4. 2 Old answer is invalidated So it must use difference! 10/20/2021 Jinze Liu @ University of Kentucky 48
Why do we need core operator X? l Cross product l l Difference l l The only non-monotone operator Union l l The only operator that adds columns The only operator that allows you to add rows? Selection? Projection? 10/20/2021 Jinze Liu @ University of Kentucky 49
Additional Operators l l Outer join Division 10/20/2021 Jinze Liu @ University of Kentucky 50
(Left) Outer Join l l Input: two tables R and S Notation: R PS Purpose: pairs rows from two tables Output: for each row r in R and each row s in S, l l l if p satisfies, output a row rs (concatenation of r and s) Otherwise, output a row r with NULLs Right outer join and full outer join are defined similarly 10/20/2021 Jinze Liu @ University of Kentucky 51
Left Outer Join Example l Employee Eid = Mid Department Eid Name Did Mid Dname 1234 John Smith 4 1234 Research 1123 Mary Carter 5 1123 Finance 1011 Bob Lee Eid = Mid 10/20/2021 Eid Name Did Mid Dname 1234 John Smith 4 1234 Research 1123 Mary Carter 5 1123 Finance 1011 Bob Lee NULL Jinze Liu @ University of Kentucky 52
Division Operator l l l Input: two tables R and S Notation: R S Purpose: Find the subset of items in one set R that are related to all items in another set 10/20/2021 Jinze Liu @ University of Kentucky 53
Division Operator l Find professors who have taught courses in all departments l Why does this involve division? Contains row <p, d> if professor p has taught a course in department d 10/20/2021 Prof. Id Dept. Id Jinze Liu @ University of Kentucky Dept. Id All department Ids 54
Why is r. a. a good query language? l Simple l l Declarative? l l l A small set of core operators who semantics are easy to grasp Yes, compared with older languages like CODASYL Though operators do look somewhat “procedural” Complete? l With respect to what? 10/20/2021 Jinze Liu @ University of Kentucky 55
Review l l Expression tree Tips in writing R. A. l l Use temporary variables Use foreign keys to join tables A comparison is to identify a relationship Use set minus in non-monotonic results 10/20/2021 Jinze Liu @ University of Kentucky 57
Next Time l Continue on relational algebra 10/20/2021 Jinze Liu @ University of Kentucky 58
- Slides: 57