CS 405 G Introduction to Database Systems Instructor
CS 405 G: Introduction to Database Systems Instructor: Jinze Liu Fall 2017
Review l SELECT a list of attributes FROM a list of relations WHERE condition; l Condition may have logical operators AND, OR, NOT Condition may have comparison operators: <. <=, <>, >=, > String comparison may use “=” (exactly match) or “LIKE” (matching with regular expressions) l l %, _, (Arithmetic) expressions of attributes are allowed 9/16/2021 Jinze Liu @ University of Kentucky 2
Exercise l SELECT sid, 2007 – age, FROM STUDENT WHERE name LIKE ‘%John%’ OR GPA > 3. 6; sid name age gpa 1234 John Smith 21 3. 5 1234 John. Smith 1986 21 3. 5 1123 Mary Carter 19 3. 8 1123 Mary. Carter 1988 19 3. 8 1011 Bob Lee 22 2. 6 1011 Bob. Lee 22 22 2. 6 1204 Susan Wong 22 3. 4 1204 Susan. Wong 22 22 3. 4 1306 Kevin Kim 18 2. 9 1306 Kevin. Kim 18 18 2. 9 asdfsadf 9/16/2021 Jinze Liu @ University of Kentucky 3
Relational Database Student (SID integer, name varchar(30), age integer, GPA float); Course (CID char(10), title varchar(100)); Enroll (SID integer, CID char(10)); Create table Student (SID integer, name varchar(30), age integer, GPA float); Create table Course (CID char(10), title varchar(100)); Create table Enroll (SID integer, CID char(10)); 9/16/2021 Jinze Liu @ University of Kentucky 4
Today’s Topic l Set and Bag operation l l l Aggregation l l UNION, INTERSECTION, SELECT age, AVG(GPA) EXCEPT FROM Student S ENROLL DISTINCT WHERE S. SID = E. SID HAVING Nested queries 9/16/2021 E AND E. CID = ‘EECS 108’ GROUP BY age HAVING age > 20; --Compute the average GPA for Students who are at least 20 years old and are enrolled in 108 with the same age Jinze Liu @ University of Kentucky 5
SQL set and bag operations l UNION, EXCEPT, INTERSECT l Set semantics l l l Duplicates in input tables, if any, are first eliminated Exactly like set union, - and intersect in relational algebra UNION ALL, EXCEPT ALL, INTERSECT ALL l l l Bag semantics Think of each row as having an implicit count (the number of times it appears in the table) Bag union: sum up the counts from two tables Bag difference: subtract the two counts (a row with negative count vanishes) Bag intersection: take the minimum of the two counts 9/16/2021 Jinze Liu @ University of Kentucky 6
Examples of bag operations Bag 1 Bag 2 fruit Apple Orange Bag 1 UNION ALL Bag 2 Bag 1 INTERSECT ALL Bag 2 fruit Apple Bag 1 EXCEPT ALL Bag 2 Apple fruit Apple Orange 9/16/2021 Jinze Liu @ University of Kentucky 7
Exercise l Enroll(SID, CID), Club. Member(club, SID) l (SELECT SID FROM Club. Member) EXCEPT (SELECT SID FROM Enroll); l l SID’s of students who are in clubs but not taking any classes (SELECT SID FROM Club. Member) EXCEPT ALL (SELECT SID FROM Enroll); l 9/16/2021 SID’s of students who are in more clubs than classes Jinze Liu @ University of Kentucky 8
Forcing set semantics l SQL provides the option of set semantics with DISTINCT keyword l SID’s of all enrolled students SELECT SID, FROM Enroll, l Say Bart takes EECS 700 and EECS 647 SELECT DISTINCT SID, FROM Enroll, l With DISTINCT, all duplicate SIDs are removed from the output 9/16/2021 Jinze Liu @ University of Kentucky 9
Operational Semantics of SFW l SELECT [DISTINCT] E 1, E 2, …, En FROM R 1, R 2, …, Rm WHERE condition; l For each t 1 in R 1: For each t 2 in R 2: … … For each tm in Rm: If condition is true over t 1, t 2, …, tm: Compute and output E 1, E 2, …, En as a row If DISTINCT is present Eliminate duplicate rows in output t 1, t 2, …, tm are often called tuple variables l 9/16/2021 Jinze Liu @ University of Kentucky 10
Summary of SQL features SELECT-FROM-WHERE statements (select-project-join queries) Renaming operation Set and bag operations l l F UNION, DIFFERENCE, INTERSECTION Next: aggregation and grouping 9/16/2021 Jinze Liu @ University of Kentucky 11
Aggregates l Standard SQL aggregate functions: COUNT, SUM, AVG, MIN, MAX l Example: number of students under 18, and their average GPA l l SELECT COUNT(*), AVG(GPA) FROM Student WHERE age < 18; COUNT(*) counts the number of rows 9/16/2021 Jinze Liu @ University of Kentucky 12
Aggregates with DISTINCT l Example: How many students are taking classes? l l SELECT COUNT (SID) FROM Enroll; SELECT COUNT(DISTINCT SID) FROM Enroll; 9/16/2021 Jinze Liu @ University of Kentucky 13
GROUP BY l SELECT … FROM … WHERE … GROUP BY list_of_columns; l Example: find the average GPA for each age group l SELECT age, AVG(GPA) FROM Student GROUP BY age; 9/16/2021 Jinze Liu @ University of Kentucky 14
Operational semantics of GROUP BY SELECT … FROM … WHERE … GROUP BY …; l Compute FROM l Compute WHERE l Compute GROUP BY: group rows according to the values of GROUP BY columns l Compute SELECT for each group l F For aggregation functions with DISTINCT inputs, first eliminate duplicates within the group Number of groups = number of rows in the final output 9/16/2021 Jinze Liu @ University of Kentucky 15
Example of computing GROUP BY SELECT age, AVG(GPA) FROM Student GROUP BY age; name sid age gpa 1234 John Smith 21 3. 5 1123 Mary Carter 19 3. 8 1011 Bob Lee 22 2. 6 1204 Susan Wong 22 3. 4 1306 Kevin Kim 19 2. 9 Compute SELECT for each group age gpa 9/16/2021 Compute GROUP BY: group rows according to the values of GROUP BY columns sid name age gpa 1234 John Smith 21 3. 5 1123 Mary Carter 19 3. 8 21 3. 5 1306 Kevin Kim 19 2. 9 19 3. 35 1011 Bob Lee 22 2. 6 22 3. 0 1204 Susan Wong 22 3. 4 Jinze Liu @ University of Kentucky 16
Aggregates with no GROUP BY l An aggregate query with no GROUP BY clause represent a special case where all rows go into one group SELECT AVG(GPA) FROM Student; Compute aggregate over the group sid name age gpa 1234 John Smith 21 3. 5 1123 Mary Carter 19 3. 8 1011 Bob Lee 22 2. 6 1204 Susan Wong 22 3. 4 1306 Kevin Kim 19 2. 9 gpa 3. 24 9/16/2021 Jinze Liu @ University of Kentucky Group all rows into one group 17
Restriction on SELECT If a query uses aggregation/group by, then every column referenced in SELECT must be either l l l F Aggregated, or A GROUP BY column This restriction ensures that any SELECT expression produces only one value for each group 9/16/2021 Jinze Liu @ University of Kentucky 18
Examples of invalid queries l SELECT SID, age FROM Student GROUP BY age; l l l Recall there is one output row per group There can be multiple SID values per group SELECT SID, MAX(GPA) FROM Student; l Recall there is only one group for an aggregate query with no GROUP BY clause l There can be multiple SID values Wishful thinking (that the output SID value is the one associated with the highest GPA) does NOT work l 9/16/2021 Jinze Liu @ University of Kentucky 19
HAVING l l Used to filter groups based on the group properties (e. g. , aggregate values, GROUP BY column values) SELECT … FROM … WHERE … GROUP BY … HAVING condition; l l l Compute FROM Compute WHERE Compute GROUP BY: group rows according to the values of GROUP BY columns Compute HAVING (another selection over the groups) Compute SELECT for each group that passes HAVING 9/16/2021 Jinze Liu @ University of Kentucky 20
HAVING examples l Find the average GPA for each age group over 10 l l l SELECT age, AVG(GPA) FROM Student GROUP BY age HAVING age > 10; Can be written using WHERE without table expressions List the average GPA for each age group with more than a hundred students l l SELECT age, AVG(GPA) FROM Student GROUP BY age HAVING COUNT(*) > 100; Can be written using WHERE and table expressions 9/16/2021 Jinze Liu @ University of Kentucky 21
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l l Renaming operation Set and bag operations Aggregation and grouping F Next: Table expressions, subqueries l F F Table Scalar In Exist 9/16/2021 Jinze Liu @ University of Kentucky 22
Table expression l l Use query result as a table l In set and bag operations, FROM clauses, etc. l A way to “nest” queries Example: names of students who are in more clubs than classes SELECT DISTINCT name FROM Student, (SELECT SID FROM Club. Member) ( EXCEPT ALL (SELECT SID FROM Enroll) ) AS S WHERE Student. SID = S. SID; 9/16/2021 Jinze Liu @ University of Kentucky 23
Scalar subqueries l l A query that returns a single row can be used as a value in WHERE, SELECT, etc. Example: students at the same age as Bart SELECT * What’s Bart’s age? FROM Student WHERE age = ( SELECT age FROM Student WHERE name = ’Bart’ ); l Runtime error if subquery returns more than one row l Under what condition will this runtime error never occur? l l name is a key of Student What if subquery returns no rows? l The value returned is a special NULL value, and the comparison fails 9/16/2021 Jinze Liu @ University of Kentucky 24
IN subqueries l l x IN (subquery) checks if x is in the result of subquery Example: students at the same age as (some) Bart SELECT * What’s Bart’s age? FROM Student WHERE age IN (SELECT age FROM Student WHERE name = ’Bart’ ); 9/16/2021 Jinze Liu @ University of Kentucky 25
EXISTS subqueries l l EXISTS (subquery) checks if the result of subquery is non-empty Example: students at the same age as (some) Bart l SELECT * FROM Student AS s WHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s. age); l This happens to be a correlated subquery—a subquery that references tuple variables in surrounding queries 9/16/2021 Jinze Liu @ University of Kentucky 26
Operational semantics of subqueries l SELECT * FROM Student AS s WHERE EXISTS (SELECT * FROM Student WHERE name = ’Bart’ AND age = s. age); l For each row s in Student l l l Evaluate the subquery with the appropriate value of s. age If the result of the subquery is not empty, output s. * The DBMS query optimizer may choose to process the query in an equivalent, but more efficient way (example? ) 9/16/2021 Jinze Liu @ University of Kentucky 27
Summary of SQL features covered so far l SELECT-FROM-WHERE statements l Ordering Set and bag operations Aggregation and grouping Table expressions, subqueries l l l F Next: NULL’s, outerjoins, data modification, constraints, … 9/16/2021 Jinze Liu @ University of Kentucky 28
- Slides: 28