Lecture 07 Relational Algebra 1 Outline Relational Algebra

  • Slides: 31
Download presentation
Lecture 07: Relational Algebra 1

Lecture 07: Relational Algebra 1

Outline • Relational Algebra (Section 6. 1) 2

Outline • Relational Algebra (Section 6. 1) 2

Relational Algebra • Formalism for creating new relations from existing ones • Its place

Relational Algebra • Formalism for creating new relations from existing ones • Its place in the big picture: Declarative query language SQL, relational calculus Algebra Implementation Relational algebra 3

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P Cartesian Product: • Derived or auxiliary operators: – Intersection, complement – Joins (natural, equi-join, theta join, semi-join) – Renaming: r 4

1. Union and 2. Difference • R 1 R 2 Example: – Active. Employees

1. Union and 2. Difference • R 1 R 2 Example: – Active. Employees Retired. Employees • R 1 – R 2 Example: – All. Employees − Retired. Employees 5

What about Intersection ? • It is a derived operator R 1 R 2

What about Intersection ? • It is a derived operator R 1 R 2 = R 1 – (R 1 – R 2) • Also expressed as a join (will see later) Example – Unionized. Employees Retired. Employees 6

3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) •

3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • Examples – s. Salary > 40000 (Employee) – sname = “Smith” (Employee) • The condition c can be =, <, , >, , <> [in SQL: SELECT * FROM Employee WHERE Salary > 40000] 7

Find all employees with salary more than $40, 000. s Salary > 40000 (Employee)

Find all employees with salary more than $40, 000. s Salary > 40000 (Employee) 8

4. Projection • Eliminates columns, then removes duplicates • Notation: P A 1, …,

4. Projection • Eliminates columns, then removes duplicates • Notation: P A 1, …, An (R) • Example: project to social-security number and names: – P SSN, Name (Employee) – Output schema: Answer(SSN, Name) [In SQL: SELECT DISTINCT SSN, Name FROM Employee] 9

P SSN, Name (Employee) 10

P SSN, Name (Employee) 10

5. Cartesian Product • Combine each tuple in R 1 with each tuple in

5. Cartesian Product • Combine each tuple in R 1 with each tuple in R 2 • Notation: R 1 R 2 • Example: – Employee Dependents • Very rare in practice; mainly used to express joins [In SQL: SELECT * FROM R 1, R 2] 11

12

12

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P Cartesian Product: • Derived or auxiliary operators: – Intersection, complement – Joins (natural, equi-join, theta join, semi-join) – Renaming: r 13

Renaming • • Changes the schema, not the instance Schema: R(A 1, …, An

Renaming • • Changes the schema, not the instance Schema: R(A 1, …, An ) Notation: r B 1, …, Bn (R) Example: – r. Last. Name, Soc. No (Employee) – Output schema: Answer(Last. Name, Soc. No) [in SQL: SELECT Name AS Last. Name, SSN AS Soc. No 14 FROM Employee]

Renaming Example Employee Name John Tony r. Last. Name, Soc. No Last. Name John

Renaming Example Employee Name John Tony r. Last. Name, Soc. No Last. Name John Tony SSN 99999 77777 (Employee) Soc. No 99999 77777 15

Natural Join • Notation: R 1 ⋈ R 2 • Meaning: R 1 ⋈

Natural Join • Notation: R 1 ⋈ R 2 • Meaning: R 1 ⋈ R 2 = PA(s. C(R 1 R 2)) • Where: – The selection s. C checks equality of all common attributes – The projection eliminates the duplicate common attributes [in SQL: R 2 SELECT DISTINCT R 1. A, R 1. B, R 2. C FROM R 1, WHERE R 1. B = R 2. B Schema: R 1(A, B), R 2(B, C)] 16

Natural Join Example Employee Name John Tony SSN 99999 77777 Dependents SSN 99999 77777

Natural Join Example Employee Name John Tony SSN 99999 77777 Dependents SSN 99999 77777 Dname Emily Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN 2(Employee x r. SSN 2, Dname(Dependents)) Name John Tony SSN 99999 77777 Dname Emily Joe 17

Natural Join • R= • R ⋈ S= A B X S= B C

Natural Join • R= • R ⋈ S= A B X S= B C Y Z U X Z V W Y Z Z V A B C X Z U X Z V Y Z U Y Z V W 18

Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what

Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S ? • Given R(A, B), S(A, B), what is R ⋈ S ? 19

Theta Join • A join that involves a predicate • R 1 ⋈ q

Theta Join • A join that involves a predicate • R 1 ⋈ q R 2 = s q (R 1 R 2) • Here q can be any condition 20

Eq-join • A theta join where q is an equality R 1 ⋈A=B R

Eq-join • A theta join where q is an equality R 1 ⋈A=B R 2 = s A=B (R 1 R 2) • Example: – Employee ⋈SSN=SSN Dependents • Most useful join in practice (difference to natural join? ) 21

Semijoin • R ⋉ S = P A 1, …, An (R ⋈ S)

Semijoin • R ⋉ S = P A 1, …, An (R ⋈ S) • Where A 1, …, An are the attributes in R • Example: – Employee ⋉ Dependents 22

Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee SSN

Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee SSN Name . . . SSN network . . . Dname Age. . . Employee ⋈ssn=ssn (s age>71 (Dependents)) R = Employee ⋉ T T = P SSN s age>71 (Dependents) 23 Answer = R ⋈ Dependents

Complex RA Expressions P name buyer-ssn=ssn pid=pid seller-ssn=ssn Person Purchase P pid sname=fred sname=gizmo

Complex RA Expressions P name buyer-ssn=ssn pid=pid seller-ssn=ssn Person Purchase P pid sname=fred sname=gizmo Person Product 24

Application: Query Rewriting for Optimization sname bid=100 rating > 5 sid=sid (Scan; write to

Application: Query Rewriting for Optimization sname bid=100 rating > 5 sid=sid (Scan; write to bid=100 temp T 1) sid=sid Reserves Sailors Reserves rating > 5 (Scan; write to temp T 2) Sailors The earlier we process selections, less tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages? 25

Algebraic Laws (Examples) • Commutative and Associative Laws – R ∩ S = S

Algebraic Laws (Examples) • Commutative and Associative Laws – R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T – R S = S R, R (S T) = (R S) T • Laws involving selection – s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) – s C (R S) = s C (R) S • When C involves only attributes of R • Laws involving projections – PM(PN(R)) = PM, N(R) 26

Operations on Bags A bag = a set with repeated elements All operations need

Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a, b, b, c} {a, b, b, b, e, f, f}={a, a, b, b, b, c, e, f, f} • {a, b, b, b, c, c} – {b, c, c, c, d} = {a, b, b, d} • s. C(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets ! 27

Finally: RA has Limitations ! • Cannot compute “transitive closure” Name 1 Name 2

Finally: RA has Limitations ! • Cannot compute “transitive closure” Name 1 Name 2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write C program 28

Formulating queries in RA • Consider a database for student enrollment for courses, and

Formulating queries in RA • Consider a database for student enrollment for courses, and books used in the courses – – – STUDENT (SSN, Name, Major, Bdate) COURSE (Course#, Cname, Dept) ENROLL (SSN, Course#, Quarter, Grade) BOOK_ADOPTION (Course#, Quarter, Book_ISBN) TEXT (Book_ISBN, Book_Title, Publisher, Author) 29

Formulating queries in RA • Specify the following queries in relational algebra – List

Formulating queries in RA • Specify the following queries in relational algebra – List the number of courses (Course#) taken by all students named ‘John Smith’ in Winter 1999 (i. e. , Quarter = W 99) – List any department which has all its adopted books published by ‘BC Publishing’ 30

Formulating Queries in RA § PCourse# (s Quarter=W 99 ((s Name= ‘John Smith’ (STUDENT)

Formulating Queries in RA § PCourse# (s Quarter=W 99 ((s Name= ‘John Smith’ (STUDENT) ⋈ ENROLL)) § Other. Dept = PDept ((s Publisher <> ‘PS Publishers’ (BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE) All. Dept = PDept (BOOK_ADOPTION ⋈ COURSE) WHY? Answer = All. Dept - Other. Dept § And how will you express it in SQL? 31