CS 405 G Introduction to Database Systems SQL
CS 405 G: Introduction to Database Systems SQL III
Topic next l Set and Bag operation l l l UNION, INTERSECTION, EXCEPT 11/10/2020 Jinze Liu @ University of Kentucky 2
Set versus bag semantics l Set l l l No duplicates Relational model and algebra use set semantics Bag l l l Duplicates allowed Number of duplicates is significant SQL uses bag semantics by default 11/10/2020 Luke Huan Univ. of Kansas 3
Set versus bag example πSID Enroll sid Enroll 1234 sid cid grade 1124 1234 647 A 1123 1234 108 B 1124 647 A 1123 108 A SELECT SID FROM Enroll; sid 1234 1124 1123 11/10/2020 Luke Huan Univ. of Kansas 4
A case for bag semantics l Efficiency l l Which one is more useful? l l l Saves time of eliminating duplicates πGPA Student SELECT GPA FROM Student; The first query just returns all possible GPA’s The second query returns the actual GPA distribution Besides, SQL provides the option of set semantics with DISTINCT keyword 11/10/2020 Luke Huan Univ. of Kansas 5
Forcing set semantics l SID’s of all pairs of classmates l SELECT e 1. SID AS SID 1, e 2. SID AS SID 2 FROM Enroll AS e 1, Enroll AS e 2 WHERE e 1. CID = e 2. CID AND e 1. SID > e 2. SID; l l Say Bart and Lisa both take CPS 116 and CPS 114 SELECT DISTINCT e 1. SID AS SID 1, e 2. SID AS SID 2. . . l 11/10/2020 With DISTINCT, all duplicate (SID 1, SID 2) pairs are removed from the output Luke Huan Univ. of Kansas 6
SQL set and bag operations l UNION, EXCEPT, INTERSECT l Set semantics l l l Duplicates in input tables, if any, are first eliminated Exactly like set union, - and intersect in relational algebra UNION ALL, EXCEPT ALL, INTERSECT ALL l l l Bag semantics Think of each row as having an implicit count (the number of times it appears in the table) Bag union: sum up the counts from two tables Bag difference: subtract the two counts (a row with negative count vanishes) Bag intersection: take the minimum of the two counts 11/10/2020 Jinze Liu @ University of Kentucky 7
Examples of bag operations Bag 1 Bag 2 fruit Apple Orange Bag 1 UNION ALL Bag 2 Bag 1 INTERSECT ALL Bag 2 fruit Apple Bag 1 EXCEPT ALL Bag 2 Apple fruit Apple Orange 11/10/2020 Jinze Liu @ University of Kentucky 8
Exercise l Enroll(SID, CID), Club. Member(club, SID) l (SELECT SID FROM Club. Member) EXCEPT (SELECT SID FROM Enroll); l l SID’s of students who are in clubs but not taking any classes (SELECT SID FROM Club. Member) EXCEPT ALL (SELECT SID FROM Enroll); l 11/10/2020 SID’s of students who are in more clubs than classes Jinze Liu @ University of Kentucky 9
Operational Semantics of SFW l SELECT [DISTINCT] E 1, E 2, …, En FROM R 1, R 2, …, Rm WHERE condition; l For each t 1 in R 1: For each t 2 in R 2: … … For each tm in Rm: If condition is true over t 1, t 2, …, tm: Compute and output E 1, E 2, …, En as a row If DISTINCT is present Eliminate duplicate rows in output t 1, t 2, …, tm are often called tuple variables l 11/10/2020 Jinze Liu @ University of Kentucky 10
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l l Renaming operation Set and bag operations Aggregation and grouping F Next: Table expressions, subqueries l F F Table Scalar In Exist 11/10/2020 Jinze Liu @ University of Kentucky 11
Table expression l l Use query result as a table l In set and bag operations, FROM clauses, etc. l A way to “nest” queries Example: names of students who are in more clubs than classes SELECT DISTINCT name FROM Student, (SELECT SID FROM Club. Member) ( EXCEPT ALL (SELECT SID FROM Enroll) ) AS S WHERE Student. SID = S. SID; 11/10/2020 Jinze Liu @ University of Kentucky 12
Scalar subqueries l l A query that returns a single row can be used as a value in WHERE, SELECT, etc. Example: students at the same age as Bart SELECT * What’s Bart’s age? FROM Student WHERE age = ( SELECT age FROM Student WHERE name = ’Bart’ ); l Runtime error if subquery returns more than one row l Under what condition will this runtime error never occur? l l name is a key of Student What if subquery returns no rows? l The value returned is a special NULL value, and the comparison fails 11/10/2020 Jinze Liu @ University of Kentucky 13
IN subqueries l l x IN (subquery) checks if x is in the result of subquery Example: students at the same age as (some) Bart SELECT * What’s Bart’s age? FROM Student WHERE age IN (SELECT age FROM Student WHERE name = ’Bart’ ); 11/10/2020 Jinze Liu @ University of Kentucky 14
EXISTS subqueries l l EXISTS (subquery) checks if the result of subquery is non-empty Example: students at the same age as (some) Bart l SELECT * FROM Student AS s WHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s. age); l This happens to be a correlated subquery—a subquery that references tuple variables in surrounding queries 11/10/2020 Jinze Liu @ University of Kentucky 15
Operational semantics of subqueries l SELECT * FROM Student AS s WHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s. age); l For each row s in Student l l l Evaluate the subquery with the appropriate value of s. age If the result of the subquery is not empty, output s. * The DBMS query optimizer may choose to process the query in an equivalent, but more efficient way (example? ) 11/10/2020 Jinze Liu @ University of Kentucky 16
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l Ordering Set and bag operations Aggregation and grouping Table expressions, subqueries l l l F Next: NULL’s, outerjoins, data modification, constraints, … 11/10/2020 Jinze Liu @ University of Kentucky 17
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l Ordering Set and bag operations Aggregation and grouping Nested queries l l l F Next: NULL’s, outerjoins, data modification, constraints, … 11/10/2020 Jinze Liu @ University of Kentucky 18
Incomplete information l l Example: Student (SID, name, age, GPA) Value unknown l l We do not know Nelson’s age Value not applicable l Nelson has not taken any classes yet; what is his GPA? 11/10/2020 Jinze Liu @ University of Kentucky 19
Solution 1 l A dedicated special value for each domain (type) l l GPA cannot be – 1, so use – 1 as a special value to indicate a missing or invalid GPA Leads to incorrect answers if not careful l l Complicates applications l l SELECT AVG(GPA) FROM Student; SELECT AVG(GPA) FROM Student WHERE GPA <> -1; Remember the Y 2 K bug? l 11/10/2020 “ 00” was used as a missing or invalid year value Jinze Liu @ University of Kentucky 20
Solution 2 l A valid-bit for every column l l Student (SID, name_is_valid, age_is_valid, GPA_is_valid) Complicates schema and queries l 11/10/2020 SELECT AVG(GPA) FROM Student WHERE GPA_is_valid; Jinze Liu @ University of Kentucky 21
Solution 3? l Decompose the table; missing row = missing value l l Student. Name (SID, name) Student. Age (SID, age) Student. GPA (SID, GPA) Student. ID (SID) Conceptually the cleanest solution Still complicates schema and queries l l 11/10/2020 How to get all information about a student in a table? Would natural join work? Jinze Liu @ University of Kentucky 22
SQL’s solution l A special value NULL l l l For every domain Special rules for dealing with NULL’s Example: Student (SID, name, age, GPA) l h 789, “Nelson”, NULL i 11/10/2020 Jinze Liu @ University of Kentucky 23
Three-valued logic l l TRUE = 1, FALSE = 0, UNKNOWN = 0. 5 x AND y = min(x, y) x OR y = max(x, y) NOT x = 1 – x AND True False NULL True False UNK False NULL UNK False UNK OR True False NULL TRUE AND l False UNK NULL TRUE UNK OR WHERE and HAVING clauses only select rows for output if the condition evaluates to TRUE l UNKNOWN is not enough 11/10/2020 Jinze Liu @ University of Kentucky 24
Computing with NULL’s l l l (Arithmetic operation) when we operate on a NULL and another value (including another NULL) using +, –, etc. , the result is NULL Aggregate functions ignore NULL, except COUNT(*) (since it counts rows) When we compare a NULL with another value (including another NULL) using =, >, etc. , the result is UNKNOWN 11/10/2020 Jinze Liu @ University of Kentucky 25
Unfortunate consequences l SELECT AVG(GPA) FROM Student l l SELECT SUM(GPA)/COUNT(*) FROM Student; l l l 3. 4 2. 72 SELECT * FROM Student; SELECT * FROM Student WHERE GPA = GPA l Not equivalent 11/10/2020 sid name age gpa 1234 John Smith 21 3. 5 1123 Mary Carter 19 3. 8 1011 Bob Lee 22 NULL 1204 Susan Wong 22 3. 4 1306 Kevin Kim 18 2. 9 Jinze Liu @ University of Kentucky 26
Another problem l Example: Who has NULL GPA values? l SELECT * FROM Student WHERE GPA = NULL; l l (SELECT * FROM Student) EXCEPT ALL (SELECT * FROM Student WHERE GPA = GPA) l l Does not work; never returns anything Works, but ugly Introduced built-in predicates IS NULL and IS NOT NULL l 11/10/2020 SELECT * FROM Student WHERE GPA IS NULL; Jinze Liu @ University of Kentucky 27
Outerjoin motivation l Example: a master class list l SELECT c. CID, s. SID FROM Enroll e, Student s, Course c WHERE e. SID = s. SID and c. CID = e. CID; l What if a student take no classes l l For these students, CID column should be NULL What if a course with no student enrolled yet? l 11/10/2020 For these courses, SID should be NULL Jinze Liu @ University of Kentucky 28
Outerjoin l SELECT * FROM R FULL OUTER JOIN S ON p; l A full outer join between R and S (denoted R ! S) includes all rows in the result of R !p. S, plus l “Dangling” R rows (those that do not join with any S rows) padded with NULL’s for S’s columns l “Dangling” S rows (those that do not join with any R rows) padded with NULL’s for R’s columns 11/10/2020 Jinze Liu @ University of Kentucky 29
Outerjoin (II) l SELECT * FROM R LEFT l SELECT * FROM R RIGHT OUTER JOIN S ON p; l A left outer join (R ! S) includes rows in R ! S plus dangling R rows padded with NULL’s l A right outer join (R ! S) includes rows in R ! S plus dangling S rows padded with NULL’s 11/10/2020 OUTER JOIN S ON p; ; Jinze Liu @ University of Kentucky 30
Outerjoin examples SELECT * FROM Employee LEFT OUTER JOIN Department ON Eid = Mid Employee Eid Name Did Mid Dname 1123 John Smith 4 1123 Research 1234 Mary Carter 5 1234 Finance 1311 Bob Lee NULL SELECT * FROM Employee FULL RIGHTOUTER JOIN Department ON Eid = Mid Department Did Mid Dname Eid Name Did Mid Dname 4 1123 Research 1123 John Smith 4 1123 Research 5 1234 Finance 1234 Mary Carter 5 1234 Finance 6 1312 HR NULL 1311 Bob NULL Lee NULL 6 NULL 1312 NULL HR NULL 6 1312 HR 11/10/2020 Jinze Liu @ University of Kentucky 31
Summary l Query l SELECT-FROM-WHERE statements l Ordering Set and bag operations Aggregation and grouping Table expressions, subqueries NULL Outerjoins l l l 11/10/2020 Jinze Liu @ University of Kentucky 32
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l l Set and bag operations Ordering Aggregation and grouping Table expressions, subqueries NULL’s and outerjoins F Next: data modification statements, constraints l l l 11/10/2020 Jinze Liu @ University of Kentucky 33
- Slides: 33