CS 5423 Principles of Database Systems DB Fundamentals
CS 5423 Principles of Database Systems DB Fundamentals
What are Database Management Systems DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFE MULTIUSER storage of and access to MASSIVE amounts of PERSISTENT data 2
Capabilities of DBMS ■ Persistent Storage – Data Structures for efficient access to very large amounts of data ■ Programming Interface – More than reading and writing of files. Access/search and modify data through a query language ■ Transaction Management- ACID – a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc ■ ■ Atomicity : each transaction is treated as a single "unit Consistency: any data written to the database must be valid according to all defined rules, Isolation: controls how and when changes are made and if they become visible to each other, users, and systems Durability: completed transactions will survive permanently 3
Example: Banking System ■ Data – Information on accounts, customers, balances, current interest rates, transaction histories, etc. ■ MASSIVE – Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory ■ PERSISTENT – Data outlives programs that operate on it 4
Example: Banking System ■ SAFE: – from system failures – from malicious users ■ CONVENIENT: – simple commands to debit account, get balance, write statement, transfer funds, etc. – also unpredicted queries should be easy ■ EFFICIENT: – don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. – massive data! -> DBMS's carefully tuned for performance 5
Multi-user Access ■ Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls – Alex @ ATM 1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance : = balance - 100; dispense cash; put new balance into database; – Bob @ ATM 2: withdraw $50 from account #007 get balance from database; if balance >= 50 then balance : = balance - 50; dispense cash; put new balance into database; – Initial balance = 120. Final balance = ? ? 6
Why File Systems Won’t Work ■ Storing data: file system is limited – – – ■ Query/update: – – ■ need to write a new C++/Java program for every new query need to worry about performance Concurrency: limited protection – – ■ size limit by disk or address space when system crashes we may lose data Password/file-based authorization insufficient need to worry about interfering with other users need to offer different views to different users (e. g. registrar, students, professors) – Professor: All students’ grade – Student-Own grade Schema change: – – entails changing file formats need to rewrite virtually all applications That’s why the notion of DBMS was motivated! 7
DBMS Architecture User/Web Forms/Applications/DBA query Query Parser transaction DDL commands Transaction Manager DDL Processor Concurrency Control Logging & Recovery Query Rewriter Query Optimizer Query Executor Records Indexes Buffer Manager Storage CS 411 Buffer: data, indexes, log, etc Lock Tables Main Memory data, metadata, indexes, log, etc 8
Data Structuring: Model, Schema, Data ■ Data model – conceptual structuring of data stored in database ■ – ex: data is set of records, for a university system, student table ■ – E. g. arrays, objects in C or Java each with student-ID, name, address, courses, photo ex: airplane system: ■ data is graph where nodes represent cities, edges represent airline routes ■ Schema versus data – – schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") ■ A set of attributed with a name for a relation ■ Student(Id, Name, Address, Curse, . . ) data is actual "instance" of database, rows of relations(tables) changes rapidly ■ – Each row has a component(value) for each attribute of relation defined in schema vs. types and variables in programming languages 9
Schema vs. Data ■ Schema: name, name of each field, the type of each field – Students (Sid: string, Name: string, Age: integer, GPA: real) – A template for describing a student ■ Data: an example instance of the relation Sid Name Age GPA 0001 Alex 19 3. 55 0002 Bob 22 3. 10 0003 Chris 20 3. 80 0004 David 20 3. 95 0005 Eugene 21 3. 30 10
Data Structuring: Model, Schema, ■ Data definition language (DDL) Data – commands for setting up schema of database ■ CREATE, DROP, ALTER ■ Data Manipulation Language (DML) – Commands to manipulate data in database: ■ Select, INSERT, DELETE, MODIFY – Also called "query language“ – Does not affect schema 11
■ Create another column e. g. address – DDL ■ Insert another student information – DML Sid Name Age GPA 0001 Alex 19 3. 55 0002 Bob 22 3. 10 0003 Chris 20 3. 80 0004 David 20 3. 95 0005 Eugene 21 3. 30 12
People ■ DBMS user: queries/modifies data ■ DBMS application designer – set up schema, loads data, … ■ DBMS administrator – user management, performance tuning, … ■ DBMS implementer: builds systems 13
Key Steps in Building DB Applications ■ Step 0: pick an application domain ■ Step 1: conceptual design – Discuss what to model in the application domain – Need a modeling language to express what you want ■ – ER model is the most popular such language Output: an ER diagram of the application domain ■ Step 2: pick a type of DBMS’s – Relational DBMS is most popular and is our focus 14
Key Steps in Building DB Applications ■ Step 3: translate ER design to a relational schema – Use a set of rules to translate from ER to relational schema – Use a set of schema refinement rules to transform the above relational schema into a good relational schema ■ 1 NF, 2 NF, 3 NF, BCNF, 4 NF, …… – At this point ■ You have a good relational schema on paper 15
Key Steps in Building DB Applications ■ Step 4: Implement your relational DBMS using a "database programming language" called SQL – ■ SELECT-FROM-WHERE-GROUPBY-HAVING Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do 16
Questions so far?
ER Model ■ ER ? ■ A language to specify – what information a database must hold – what are the relationships among components of that information ■ What we will cover – basic constructs, and design principles ■ Proposed by Peter Chen in 1976 – "The Entity-Relationship Model --- Toward a Unified View of Data". in ACM transactions on database systems (TODS) ■ One of the most cited CS papers 18
Entities and Attributes ■ Entities – Real-world objects distinguishable from other objects – Described using a set of attributes ■ Attributes – each has an atomic domain: string, integers, reals, etc. ■ Entity set: a collection of similar entities 19
(Binary) Relationship ■ A mathematical definition: – if A, B are sets, then a relation R is a subset of A x B (Cartesian product) ■ A={1, 2, 3}, B={a, b, c, d}, R = {(1, a), (1, c), (3, b)} A= 1 a 2 b c 3 B= d ■ makes is a subset of Product x Company: Product makes Company 20
Multiplicity of E/R Relationships • one-one: • many-one • many-many Product 1 2 3 makes a b c d Company 21
Example ■ An employee can work in many departments; a department can have many employees ■ In contrast, each department has at most one manager. ■ So, how can we create ER diagram for this ? name ssn dname lot Employees did Manages budget Departments Works_In 22
ER Model vs. Relational Model ■ Both are used to model data ■ ER model has many concepts – Entities, relationships, attributes, etc. – Well-suited for capturing the app. requirements – Not well-suited for computer implementation ■ Relational model – Has just a single concept: relation (table) – World is represented with a collection of tables – Well-suited for efficient manipulations on computers 23
Relations ■ Schema vs. instance = columns vs. rows ■ Schema of a relation 1. Relation name 2. Attribute names 3. Attribute types (domains) ■ Schema of a database – A set of relation schemas ■ Questions – When do you determine a schema (instance)? – How often do you change your mind? 24
Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Name Price Category Manufacturer Gizmo 19. 99 Gadgets Gizmo works Power gizmo 29. 99 Gadgets Gizmo works Single touch 149. 99 Photography Canon Multi touch 203. 99 househould Hitachi Row (Record, Tuple) Domain (Atomic type) 25
Relations ■ The database maintains a current database state ■ Updates to the data happen very frequently – add a tuple – delete a tuple – modify an attribute in a tuple ■ Updates to the schema are relatively rare, and rather painful. Why? 26
Translating ER Diagram to Rel. Design ■ Basic cases – entity set E = relation with attributes of E – relationship R = relation with attributes being keys of related entity sets + attributes of R ■ Special cases – combining two relations – translating weak entity sets – translating is-a relationships and subclasses 27
Translating ER Diagrams since name ssn dname lot Employees did Manages budget Departments Translation to relational model? 1 -to-1 1 -to-Many-to-1 Many-to-Many 28
ER Diagram to Relations Relational schema E/R diagram e. g. account=(bname, acct_no, bal) E = ( a 1, …, an ) E a 1 …. . E 1 R 1 a 1 …. an c 1 …. ck an E 2 b 1 …. bm R 1= ( a 1, b 1, c 1, …, ck ) 29
ER Diagram to Relations E 1 a 1 …. an R 1 c 1 …. ck E 2 b 1 …. bm ■ Could have : R 1= ( a 1, b 1, c 1, …, ck ) – Put b 1 as the key for R 1, it is also the key for E 2=(b 1, …. , bn) ■ Usual strategy (combination) – ignore R 1 – Add a 1, c 1, …. , ck to E 2 instead, i. e. – E 2=(b 1, …. , bn, a 1, c 1, …, ck) 30
ER Diagram to Relations E 1 a 1 …. an R 1 ? ? R 1 c 1 …. ck E 2 b 1 …. bm E 1 = ( a 1, …, a. E 2 n ) = ( b 1, …, bm ) R 1 = ( a 1, b 1, c 1 …, ck ) R 1 E 1 = ( a 1, …, an ) E 2 = ( b 1, …, bm , a 1, c 1, …, ck) R 1 E 1 = ( a 1, …, an , b 1, c 1, …, ck) E 2 = ( b 1, …, bm ) R 1 Treat as n: 1 or 1: m 31
Relational Algebra ■ Querying the database: specify what we want from our database – Find all the people who earn more than $1, 000 and pay taxes in Oklahoma ■ Could write in C++/Java, but a bad idea ■ Instead use high-level query languages: – Theoretical: Relational Algebra, – Practical: SQL – Relational algebra: a basic set of operations on relations that provide the basic principles 32
What is an “Algebra”? ■ ■ Mathematical system consisting of: – Operands --- variables or values from which new values can be constructed – Operators --- symbols denoting procedures that construct new values from given values Examples – Arithmetic algebra, linear algebra, Boolean algebra …… ■ What are operands? – ■ Arithmetic Algebra: variables like x and constants like 15 What are operators? – Arithmetic Algebra: addition, subtraction… 33
What is Relational Algebra? ■ An algebra – Whose operands are relations or variables that represent relations – Whose operators are designed to do common things that we need to do with relations in a database ■ relations as input, new relation as output – Can be used as a query language for relations 34
Relational Operators at a Glance ■ Five basic RA operations: – Basic Set Operations ■ union, difference (no intersection, no complement) – Selection: (s) eliminates some rows of relations – Projection: (p) eliminates some columns of relations – Cartesian Product: (X) combine the tuples of two relations ■ When our relations have attribute names: – Renaming: (r) change the relation schema ■ Derived operations: – Intersection, complement – Joins (natural join, equi-join, theta join, semi-join, ……) 35
Set Operations ■ Union: all tuples in R 1 or R 2, denoted as R 1 U R 2 – R 1, R 2 must have the same schema – R 1 U R 2 has the same schema as R 1, R 2 – Example: – ■ ■ If any, is duplicate elimination required? ■ yes Intersection: all tuples in both R 1 and R 2 as R 1 ∩ R 2 – R 1, R 2 must have the same schema – R 1 U R 2 has the same schema as R 1, R 2 – Example: ■ ■ Active-Employees U Retired-Employees Difference: all tuples in R 1 but not in R 2, denoted as R 1 – R 2 – R 1, R 2 must have the same schema – R 1 - R 2 has the same schema as R 1, R 2 – Example ■ All-Employees - Retired-Employees 36
Selection ■ Applied to a relation R, produce a new relation with a subset of R’s tuples ■ Returns all tuples (rows) which satisfy a condition that involves the attributes of R, denoted as sc(R) – c is a condition: =, <, >, ≤, ≥, ≠, AND, OR, NOT – Output schema: same as input schema – Find all employees with salary more than $40, 000: ■ s. Salary > 40000 (Employee) SSN Name Dept-ID Salary 111060000 Alex 1 30 K 754320032 Bob 1 32 K 983210129 Chris 2 45 K SSN 98321012 9 Name Dept-ID Salary Chris 2 45 K 37
Projection ■ Unary operation: returns certain columns, denoted as P A 1, …, An (R) – Eliminates duplicate tuples ! – Input schema R(B 1, …, Bm) – Condition: {A 1, …, An} {B 1, …, Bm} – Output schema S(A 1, …, An) ■ Example: project social-security number and names: – P SSN, Name (Employee) SSN Name Dept-ID Salary SSN Name 11106000 0 Alex 1 30 K 11106000 0 Alex 75432003 2 Bob 1 32 K 75432003 2 Bob 98321012 9 Chris 2 45 K 98321012 9 Chris 38
Selection vs. Projection ■ Think of relation as a table – How are they similar? ■ Selection of subsets from table – How are they different? ■ Horizontal vs. vertical? – ■ selection-projection Duplicate elimination for both? – No duplicate in the result of selection 39
Cartesian Product ■ Each tuple in R 1 with each tuple in R 2, denoted as R 1 x R 2 – Input schemas R 1(A 1, …, An), R 2(B 1, …, Bm) – Output schema is S(A 1, …, An, B 1, …, Bm) – Very rare in practice; but joins are very common – Example: Employee x Dependent 40
Example Employee Dependent SSN Name 111060000 Alex 754320032 Brandy Employee-SSN Dependent. Name 111060000 Chris 754320032 David Employee x Dependent SSN Name Employee-SSN Dependent. Name 111060000 Alex 111060000 Chris 111060000 Alex 754320032 David 754320032 Brandy 111060000 Chris 754320032 Brandy 754320032 David 41
Renaming ■ Does not change the relational instance, denoted as Notation: r S(B 1, …, Bn) (R) ■ Changes the relational schema only – Input schema: R(A 1, …, An) – Output schema: S(B 1, …, Bn) ■ Example: r Soc-sec-num, firstname(Employee) SSN Name 11106000 0 Alex 75432003 2 Bob 98321012 9 Chris Soc-sec-num firstname 111060000 Alex 754320032 Bob 983210129 Chris 42
Set Operations: Intersection ■ ■ Intersection: all tuples both in R 1 and in R 2, denoted as R 1 R 2 – R 1, R 2 must have the same schema – R 1 R 2 has the same schema as R 1, R 2 – Example: Unionized. Employees Retired. Employees Intersection is derived: – R 1 R 2 = R 1 – (R 1 – R 2) R 1 SSN Name 111060000 why ? R 2 SSN Name Alex 111060000 Alex 754320032 Bob 983210129 Chris 346565223 Alice R 1 ∩ R 2 SSN Name 111060000 Alex 754320032 Bob 43
Theta Join ■ A join that involves a predicate q, denoted as R 1 q R 2 – Input schemas: R 1(A 1, …, An), R 2(B 1, …, Bm) – Output schema: S(A 1, …, An, B 1, …, Bm) – Derived operator: R 1 q R 2 = s q (R 1 x R 2) ■ Take the product R 1 x R 2 ■ Then apply SELECTC to the result ■ As for SELECT, C can be any Boolean-valued condition 44
Theta Join: Example Product Component Pname Price PName Cost Laptop 1500 Laptop CPU 500 Car 20000 Laptop hdd 300 Laptop case 700 Car wheels 1000 Product. Info : = Product. pname=Component. PName Component Pname Price Cname Cost Laptop 1500 CPU 500 Laptop 1500 hdd 300 Laptop 1500 case 700 Car 20000 wheels 1000 45
Natural Join ■ Notation: R 1 R 2 ■ Input Schema: R 1(A 1, …, An), R 2(B 1, …, Bm) ■ Output Schema: S(C 1, …, Cp) – Where {C 1, …, Cp} = {A 1, …, An} U{B 1, …, Bm} ■ Meaning: combine all pairs of tuples in R 1 and R 2 that agree on the attributes: – {A 1, …, An} {B 1, …, Bm} (called the join attributes) 46
Natural Join: Examples Employee Dependent SSN Name 111060000 Alex 754320032 Brandy SSN Dependent. Name 111060000 Chris 754320032 David Employee Dependent = P SSN, Name, Dependent-Name(s. Employee. SSN=Dependent. SSN(Employee x Dependent) SSN Name Dependent. Name 111060000 Alex Chris 754320032 Brandy David 47
Natural Join: Examples R S A B B C X Y Z U X Z V W Y Z Z V R S A B C X Z U X Z V Y Z U Y Z V W 48
Natural Join ■ Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ■ Given R(A, B, C), S(D, E), what is R ■ Given R(A, B), S(A, B), what is R S? S? 49
Equi-join ■ Special case of theta join: condition c contains only conjunction of equalities – Result schema is the same as that of Cartesian product – May have fewer tuples than Cartesian product – Most frequently used in practice: R 1 A=B R 2 – Natural join is a particular case of equi-join – A lot of research on how to do it efficiently 50
Building Complex Expressions ■ Algebras allow us to express sequences of operations in a natural way – Example ■ In arithmetic algebra: (x + 4)*(y - 3) – Relational algebra allows the same ■ Three notations, just as in arithmetic: 1. Sequences of assignment statements 2. Expressions with several operators 3. Expression trees 51
Sequences of Assignments ■ Create temporary relation names ■ Renaming can be implied by giving relations a list of attributes ■ Example: R 3 : = R 1 JOINC R 2 can be written: R 4 : = R 1 x R 2 R 3 : = SELECTC (R 4) 52
Expressions with Several Operators ■ Example: theta-join R 3 : = R 1 JOINC R 2 can be written: R 3 : = SELECTC (R 1 x R 2) ■ Precedence of relational operators: w 1. Unary operators --- select, project, rename --- have highest precedence, bind first 2. Then come products and joins 3. Then intersection 4. Finally, union and set difference bind last But you can always insert parentheses to force the order you desire 53
Expression Trees ■ Leaves are operands – either variables standing for relations or particular constant relations ■ Interior nodes are operators, applied to their child or children 54
Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 UNION RENAMER(name) PROJECTname SELECTaddr = “Tennessee St. ” Bars PROJECTbar SELECT price<3 AND beer=“Bud” Sells 55
Glimpse Ahead: Efficient Implementations of Operators ■ s (Employees) (age >= 30 AND age <= 35) – Method 1: scan the file, test each employee – Method 2: use an index on age – Which one is better ? Depends a lot… ■ Employees Relatives – Iterate over Employees, then over Relatives – Iterate over Relatives, then over Employees – Sort Employees, Relatives, do “merge-join” – “hash-join” – Etc. 56
Glimpse Ahead: Optimizations Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Person(ssn, name, phone number, city) ■ Which is better: sprice>100(Product) (Purchase) scity=sea. Person ■ Depends ! This is the optimizer’s job… 57
Finally: RA has Limitations ! ■ Cannot compute “transitive closure” Name 1 Name 2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister ■ Find all direct and indirect relatives of Fred ■ Cannot express in RA !!! Need to write C program
Complex Queries Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Note: • in Purchase: buyer-ssn, seller-ssn are foreign keys in Person, pid is foreign key in Product; • in Product maker-cid is a foreign key in Company Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought
Exercises Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.
SQL ■ Standard language for querying and manipulating data – – ■ SQL stands for Structured Query Language Initially developed at IBM by Donald Chamberlin and Raymond Boyce in the early 1970 s, and called SEQUEL (Structured English Query Language) Many standards out there: SQL 92, SQL 3, SQL 99 Vendors support various subsets of these standards Why SQL? – A very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in languages like C++ – Its queries are “optimized” quite well, yielding efficient query executions 61
Introduction ■ Two sublanguages – DDL – Data Definition Language ■ define and modify schema CREATE TABLE table_name ( { column_name data_type [ DEFAULT default_expr ] [ column_constraint [, . . . ] ] | table_constraint } [, . . . ] ) – DML – Data Manipulation Language ■ Queries can be written intuitively Select-From-Where 62
Select-From-Where Statements ■ The principal form of a SQL query is: SELECT desired attributes FROM one or more tables WHERE condition about tuples of the tables 63
Our Running Example ■ Most of our SQL queries will be based on the following database schema – Underline indicates key attributes ■ ■ ■ Movie( title, year, length, studio. Name, prod. C#)) Stars. In(title, year, star. Id) Movie. Star(id, name, address, gender, birthdate) Movie. Exec(name, address, cert#, net. Worth) Studio(name, address, pres. C#) 64
Select-From-Where Example ■ All movies produced by Disney studios in 1990 select title from Movie where studio. Name=‘Disney’ and year= ‘ 1990’ The answer is a relation with title attribute of Movie, and tuples with the studio. Name=‘Disney’ and year= ‘ 1990’ 65
How to process Single-Relation Query ■ Operation 1. Begin with the relation in the FROM clause 2. Apply the selection indicated by the WHERE clause 3. Apply the extended projection indicated by the SELECT clause ■ Semantics 1. To implement this algorithm think of a tuple variable ranging over each tuple of the relation mentioned in FROM 2. Check if the “current” tuple satisfies the WHERE clause 3. If so, compute the attributes or expressions of the SELECT clause using the components of this tuple 66
* In SELECT clauses ■ When there is one relation in the FROM clause, * in the SELECT clause stands for “all attributes of this relation. ” Example select * from Movie where studio. Name=‘Disney’ and year= ‘ 1990’ Now, the result has each of the attributes of Movie 67
Projections and Selection ■ Projection –select ■ Selection- WHERE clause – Find Movie title and length of all movies produced by Disney studios in 1990 select title, length from Movie where studio. Name=‘Disney’ and year= ‘ 1990’ Now, the result has title, length attributes of Movie with studio. Name=‘Disney’ and year= ‘ 1990 68
Renaming Attributes ■ If you want the result to have different attribute names, use “AS <new name>” to rename an attribute Example select title as name, length as duration from Movie where studio. Name=‘Disney’ and year= ‘ 1990 Result will have name and duration attribute for movie with studio. Name=‘Disney’ and year= ‘ 1990 69
Expressions in SELECT Clauses ■ Any expression that makes sense can appear as an element of a SELECT clause ■ Example: select title, length *60 AS duration. Insecs from Movie where studio. Name=‘Disney’ and year= ‘ 1990’ 70
In where Clauses ■ What you can use in WHERE: – attribute names of the relation(s) used in the FROM – comparison operators: =, <>, <, >, <=, >= – apply arithmetic operations: stockprice*2 – operations on strings (e. g. , “||” for concatenation) – Lexicographic order on strings – Pattern matching: s LIKE p – Special stuff for comparing dates and times. 71
NULL Values ■ Tuples in SQL relations can have NULL as a value for one or more components ■ Meaning depends on context. Two common cases: – Missing value : e. g. , we know Joe’s Bar has some address, but we don’t know what it is – Inapplicable : e. g. , the value of attribute spouse for an unmarried person ■ The logic of conditions in SQL is really 3 -valued logic: TRUE, FALSE, UNKNOWN – When any value is compared with NULL, the truth value is UNKNOWN – A query only produces a tuple in the answer if its value for the WHERE clause is TRUE (not FALSE or UNKNOWN) 72
Three-Valued Logic ■ To understand how AND, OR, and NOT work in 3 -valued logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½, AND = MIN; OR = MAX, NOT(x) = 1 -x. ■ Example: TRUE AND (FALSE OR NOT(UNKNOWN)) = MIN(1, MAX(0, (1 - ½ ))) = MIN(1, MAX(0, ½ ) = MIN(1, ½ ) =½ 73
Surprising Example ■ From the following Sells relation: bar beer Price Joe’s Bud NULL SELECT bar FROM Sells WHERE price < 2. 00 OR price >= 2. 00; UNKNOWN 74
Multi-relation Queries ■ Interesting queries often combine data from more than one relation, we can address several relations in one query by listing them all in the FROM clause. – Distinguish attributes of the same name by “<relation>. <attribute>” ■ Movie(title, year, length, in. Color, studio. Name, prod. C#) ■ Movie. Exec(name, address, cert#, net. Worth) select name from Movie, Movie. Exec where title = ‘Star Wars’ AND prod. C#= cert# 75
Semantics ■ Almost the same as for single-relation queries: 1. Start with the (Cartesian) product of all the relations in the FROM clause 2. Apply the selection condition from the WHERE clause 3. Project onto the list of attributes and expressions in the SELECT clause SELECT a 1, a 2, …, ak FROM R 1 AS x 1, R 2 AS x 2, …, Rn AS xn WHERE Conditions Translation to Relational algebra: Πa 1, …, ak (s Conditions (R 1 x R 2 x … x Rn)) Select-From-Where queries are precisely Select-Project-Join 76
Semantics SELECT a 1, a 2, …, ak FROM R 1 AS x 1, R 2 AS x 2, …, Rn AS xn WHERE Conditions Answer = {} for x 1 in R 1 do for x 2 in R 2 do …. . for xn in Rn do if Conditions then Answer = Answer U {(a 1, …, ak) return Answer 77
Sub. Queries ■ Query that is part of another ■ Can have many levels ■ A parenthesized SELECT-FROM-WHERE statement (subquery) can be used as a value in a number of places, including FROM and WHERE clauses – Example: in place of a relation in the FROM clause, we can place another query, and then query its result ■ Better use a tuple-variable to name tuples of the result ■ Return – Single constant –Subqueries that return Scalar, this constant can be compared with another value in a where clause ■ – If a subquery is guaranteed to produce one tuple with one component, then the subquery can be used as a value – “Single” tuple often guaranteed by key constraint – A run-time error occurs if there is no tuple or more than one tuple Relations – can be used in many ways in a where clause 78
Subqueries – produce scalar values ■ Subquery produces a constant that is used by the main query ■ Movie(title, year, length, in. Color, studio. Name, prod. C#) ■ Movie. Exec(name, address, cert#, net. Worth) select name from Movie. Exec where cert# = (select prod. C# from Movie where title = ‘Star Wars’ ) subquery 79
Conditions involving relations ■ exists, all, in, any – exists R – T iff R is not empty – s in R – T iff s is equal to one of the values in R – s > all R – T iff s is greater than every value in R (can be < etc. ) – s > any R – T iff s is greater than at least one value in R (can be < etc. ) 80
Conditions involving relations ■ exists, all, in, any can be negated by putting not in front of the expression ■ not exists R – T iff R is empty ■ s not in R – T iff s is equal to no value of R ■ not s > all R – T iff s is not greater than the maximum value in R (can be < etc. ) ■ not s > any R – T iff s is the minimum value in R (can be < etc. ) 81
The IN Operator ■ <tuple> IN <relation> is true if and only if the tuple is a member of the relation – <tuple> NOT IN <relation> means the opposite – IN-expressions can appear in WHERE clauses – The <relation> is often a subquery 82
Conditions involving tuples select name from Movie. Exec where cert# in (select prod. C# from Movie where (title, year) in (select movie. Title, movie. Year from Stars. In where star. Name=‘Harrison Ford’ ) ) 83
The Exists Operator ■ EXISTS( <relation> ) is true if and only if the <relation> is not empty – Being a Boolean-valued operator, EXISTS can appear in WHERE clauses ■ How can we write previous example with exists ? select name from Movie. Exec ME where not Exists (select * from Movie where prod. C#=ME. cer. T# and (title, year) in (select movie. Title, movie. Year from Stars. In where star. Name=‘Harrison Ford’ ) ) 84
The Operator ANY ■ x = ANY( <relation> ) is a Boolean condition meaning that x equals at least one tuple in the relation ■ Similarly, = can be replaced by any of the comparison operators – Example: x >= ANY( <relation> ) means x is not smaller than some tuples in the relation – Note tuples must have one component only 85
The Operator ALL ■ x <> ALL( <relation> ) is true if and only if for every tuple t in the relation, x is not equal to t – That is, x is not a member of the relation. ■ The <> can be replaced by any comparison operator – Example: x >= ALL( <relation> ) means there is no tuple larger than x in the relation 86
Aggregations ■ SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column – e. g. COUNT(*) counts the number of tuples – count(distinct x) – counts number of distinct values in column x ■ Query: From Sells(bar, beer, price), find the average price of Bud SELECT AVG(price) FROM Sells WHERE beer = ‘Bud’ ■ number of tuples in Stars. In relation select count(*) from Stars. In 87
Eliminating Duplicates in an Aggregation ■ DISTINCT inside an aggregation causes duplicates to be eliminated before the aggregation ■ Query: find the number of different prices charged for Bud: SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = ‘Bud’; 88
Group By ■ We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes – ■ 1. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group Query: Movie(title, year, length, in. Color, studio. Name, prod. C#) – find sum of the lengths of all movies for each studio select studio. Name, sum(length) from movie group by studio. Name 2. From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer 89
Group By ■ group by for several relations ■ steps – Relation R is the cartesian product of the relations mentioned in the from clause – selection of the where clause is applied to R – group tuples of R according to attributes in the group by clause – execute select clause 90
Group By ■ Movie(title, year, length, in. Color, studio. Name, prod. C#) ■ Movie. Exec(name, address, cert#, net. Worth) List each producer’s total length of film produced select name, sum(length) from Movie, Movie. Exec where prod. C#=cert# group by name 91
Having Clause ■ HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated – These conditions may refer to any relation or tuple-variable in the FROM clause – They may refer to attributes of those relations, as long as the attribute makes sense within a group; i. e. , it is either: 1. A grouping attribute, or 2. Aggregated 92
Full-relation operations ■ Print each producer’s total length of film produced – only for those producers who made at least one film prior to 1930 select name, sum(length) from Movie, Movie. Exec where prod. C#=cert# group by name having min(year) < 1930 93
General form of Grouping and Aggregation SELECT S FROM R 1, …, Rn WHERE C 1 GROUP BY a 1, …, ak HAVING C 2 S = may contain attributes a 1, …, ak and/or any aggregates but NO OTHER ATTRIBUTES C 1 = is any condition on the attributes in R 1, …, Rn C 2 = is any condition on aggregate expressions or grouping attributes 94
General form of Grouping and Aggregation SELECT S FROM R 1, …, Rn WHERE C 1 GROUP BY a 1, …, ak HAVING C 2 Evaluation steps: 1. 2. 3. 4. Compute the FROM-WHERE part, obtain a table with all attributes in R 1, …, Rn Group by the attributes a 1, …, ak Compute the aggregates in C 2 and keep only groups satisfying C 2 Compute aggregates in S and return the result 95
Defining a Database Schema ■ A database schema comprises declarations for the relations (“tables”) of the database ■ Simplest form of creation is: CREATE TABLE <name> ( <list of elements> ); ■ And you may remove a relation from the database schema by: DROP TABLE <name>; 96
Elements of Table Declarations ■ The principal element is a pair consisting of an attribute and a type ■ The most common types are: – INT or INTEGER (synonyms) – REAL or FLOAT (synonyms) – CHAR(n ) = fixed-length string of n characters – VARCHAR(n ) = variable-length string of up to n characters 97
Example: Create Table CREATE TABLE Sells ( name CHAR(20), type VARCHAR(20), price REAL ); 98
Declaring Keys ■ An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE – Each says the attribute(s) so declared functionally determines all the attributes of the relation schema – Single attribute keys CREATE TABLE sells ( name CHAR(20) UNIQUE, type price REAL ); 99 VARCHAR(20),
Multi-attribute Keys CREATE TABLE Sells ( name CHAR(20), type VARCHAR(20), price REAL PRIMARY KEY (name, type) ); 100
Foreign Keys ■ A Foreign Key is a field whose values are keys in another relation – Must correspond to primary key of the second relation – Like a `logical pointer’ Enrolled 101 Students CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES Students, FOREIGN KEY (cid) REFERENCES Courses )
Modifications ■ A modification command does not return a result as a query does, but it changes the database in some way ■ There are three kinds of modifications: 1. Insert a tuple or tuples 2. Delete a tuple or tuples 3. Update the value(s) of an existing tuple or tuples 102
Insertion ■ To insert a single tuple: INSERT INTO <relation> VALUES ( <list of values> ); ■ insert into R (A 1, A 2, …AN) values (A 1, A 2, …AN) ■ insert into Stars. In (movie. Title, movie. Year, star. Name) values (‘Batman’, 2002, ‘John Wayne’) 103
Specifying Attributes in INSERT ■ We may add to the relation name a list of attributes ■ There are two reasons to do so: 1. We forget the standard order of attributes for the relation 2. We don’t have values for all attributes, and we want the system to fill in missing components with NULL or a default value ■ Another way to add the fact that Sally likes Bud to Likes(drinker, beer): INSERT INTO Likes(beer, drinker) VALUES(‘Bud’, ‘Sally’); 104
Inserting Many Tuples ■ the VALUES clause includes a set of values that are enclosed in parentheses and separated by commas. But you don’t always have to explicitly specify the values. You can instead retrieve the values through a subquery with SELECT statement or through a stored procedure. ■ We may insert the entire result of a query into a relation, using the form: INSERT INTO <relation> ( <subquery> ); 105
Example: Insert a Subquery ■ Want to add to Studio(name, address, pres. C#) all movie studios that are in the relation Movie(title, year, length, in. Color, studio. Name, prod. C#) 1 insert into Studio(name) 2 select distinct studio. Name 3 from Movie 4 where studio. Name not in 5 (select name 6 from Studio) Lines 5 -6 generate all the studio names in the relation Studio Line 4 tests that a studio name from the Movie relation is none of these studios
Deletion ■ To delete tuples satisfying a condition from some relation: DELETE FROM <relation> WHERE <condition>; ■ delete from R where <condition> ■ Example: Stars. In(movie. Title, movie. Year, star. Name) ■ Delete John Wayne is star of Batman delete from Stars. In where movie. Title=‘Batman’ AND movie. Year=2002 AND star. Name=“John Wayne” 107
Delete all Tuples ■ Make the relation Stars. In empty: DELETE FROM Stars. In; ■ Note no WHERE clause needed 108
Delete Many Tuples ■ Delete from Milk(name, manf) all Milk for which there is another milk by the same manufacturer. Milk with the same manufacturer DELETE FROM milk b WHERE EXISTS ( SELECT name FROM Milks a WHERE a. manf = b. manf AND a. name <> b. name ); and a different name from the name of the milk represented by tuple b 109
Updates ■ To change certain attributes in certain tuples of a relation: UPDATE <relation> SET <list of attribute assignments> WHERE <condition on tuples>; ■ update R set <new-assignments> where <condition> ■ Example: Attach title Pres in front of name of every movie executive who is president of a studio – Movie. Exec(name, address, cert#, net. Worth) 1 2 3 Line 3 – is certificate number Movie. Exec a one of those set name = ‘Pres. ‘||name that appear as a president’s certificate number in Studio where cert# in (select pres. C# from Studio) update Movie. Exec 110
Update Several Tuples ■ Increase price that is cheap: UPDATE Sells SET price = price * 1. 07 WHERE price < 3. 0; 111
Views ■ A view is a “virtual table”, a relation that is defined by a query in terms of the contents of other tables and views – Declare by: CREATE VIEW <name> AS <query>; ■ In contrast, a relation whose value is really stored in the database is called a base table ■ Views – Do not exist physically – Defined by an expression – Can be queried as is it were a stored table – Can be modified 112
Example: View Definition ■ Movie(title, year, length, in. Color, studio. Name, prod. C#) ■ view of titles and years of movies made by paramount studios create view Paramount. Movie as select title, year from Movie where studio. Name=‘paramount’ When we query Paramount. Movie, tuples are obtained from base table Movie 113
Example: Accessing a View ■ You may query a view as if it were a base table – There is a limited ability to modify views if the modification makes sense as a modification of the underlying base table ■ Queries involving both views and base tables ■ The names of all stars of movies made by paramount select distinct star. Name from Paramount. Movie, stars. In where title=movie. Title and year=movie. Year 114
Example: Accessing a View ■ You may query a view as if it were a base table – There is a limited ability to modify views if the modification makes sense as a modification of the underlying base table ■ Example: SELECT beer FROM Can. Drink WHERE drinker = ‘Sally’; 115
What Happens When a View Is Used? ■ The DBMS starts by interpreting the query as if the view were a base table – Typical DBMS turns the query into something like relational algebra ■ The queries defining any views used by the query are also replaced by their algebraic equivalents, and “spliced into” the expression tree for the query 116
READY TO ADVANCED TOPICS
- Slides: 117