# Lecture 07 Relational Algebra 1 Outline Relational Algebra

- Slides: 31

Lecture 07: Relational Algebra 1

Outline • Relational Algebra (Section 6. 1) 2

Relational Algebra • Formalism for creating new relations from existing ones • Its place in the big picture: Declarative query language SQL, relational calculus Algebra Implementation Relational algebra 3

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P Cartesian Product: • Derived or auxiliary operators: – Intersection, complement – Joins (natural, equi-join, theta join, semi-join) – Renaming: r 4

1. Union and 2. Difference • R 1 R 2 Example: – Active. Employees Retired. Employees • R 1 – R 2 Example: – All. Employees − Retired. Employees 5

What about Intersection ? • It is a derived operator R 1 R 2 = R 1 – (R 1 – R 2) • Also expressed as a join (will see later) Example – Unionized. Employees Retired. Employees 6

3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • Examples – s. Salary > 40000 (Employee) – sname = “Smith” (Employee) • The condition c can be =, <, , >, , <> [in SQL: SELECT * FROM Employee WHERE Salary > 40000] 7

Find all employees with salary more than $40, 000. s Salary > 40000 (Employee) 8

4. Projection • Eliminates columns, then removes duplicates • Notation: P A 1, …, An (R) • Example: project to social-security number and names: – P SSN, Name (Employee) – Output schema: Answer(SSN, Name) [In SQL: SELECT DISTINCT SSN, Name FROM Employee] 9

P SSN, Name (Employee) 10

5. Cartesian Product • Combine each tuple in R 1 with each tuple in R 2 • Notation: R 1 R 2 • Example: – Employee Dependents • Very rare in practice; mainly used to express joins [In SQL: SELECT * FROM R 1, R 2] 11

12

Relational Algebra • Five operators: – – – Union: Difference: Selection: s Projection: P Cartesian Product: • Derived or auxiliary operators: – Intersection, complement – Joins (natural, equi-join, theta join, semi-join) – Renaming: r 13

Renaming • • Changes the schema, not the instance Schema: R(A 1, …, An ) Notation: r B 1, …, Bn (R) Example: – r. Last. Name, Soc. No (Employee) – Output schema: Answer(Last. Name, Soc. No) [in SQL: SELECT Name AS Last. Name, SSN AS Soc. No 14 FROM Employee]

Renaming Example Employee Name John Tony r. Last. Name, Soc. No Last. Name John Tony SSN 99999 77777 (Employee) Soc. No 99999 77777 15

Natural Join • Notation: R 1 ⋈ R 2 • Meaning: R 1 ⋈ R 2 = PA(s. C(R 1 R 2)) • Where: – The selection s. C checks equality of all common attributes – The projection eliminates the duplicate common attributes [in SQL: R 2 SELECT DISTINCT R 1. A, R 1. B, R 2. C FROM R 1, WHERE R 1. B = R 2. B Schema: R 1(A, B), R 2(B, C)] 16

Natural Join Example Employee Name John Tony SSN 99999 77777 Dependents SSN 99999 77777 Dname Emily Joe Employee Dependents = PName, SSN, Dname(s SSN=SSN 2(Employee x r. SSN 2, Dname(Dependents)) Name John Tony SSN 99999 77777 Dname Emily Joe 17

Natural Join • R= • R ⋈ S= A B X S= B C Y Z U X Z V W Y Z Z V A B C X Z U X Z V Y Z U Y Z V W 18

Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S ? • Given R(A, B), S(A, B), what is R ⋈ S ? 19

Theta Join • A join that involves a predicate • R 1 ⋈ q R 2 = s q (R 1 R 2) • Here q can be any condition 20

Eq-join • A theta join where q is an equality R 1 ⋈A=B R 2 = s A=B (R 1 R 2) • Example: – Employee ⋈SSN=SSN Dependents • Most useful join in practice (difference to natural join? ) 21

Semijoin • R ⋉ S = P A 1, …, An (R ⋈ S) • Where A 1, …, An are the attributes in R • Example: – Employee ⋉ Dependents 22

Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee SSN Name . . . SSN network . . . Dname Age. . . Employee ⋈ssn=ssn (s age>71 (Dependents)) R = Employee ⋉ T T = P SSN s age>71 (Dependents) 23 Answer = R ⋈ Dependents

Complex RA Expressions P name buyer-ssn=ssn pid=pid seller-ssn=ssn Person Purchase P pid sname=fred sname=gizmo Person Product 24

Application: Query Rewriting for Optimization sname bid=100 rating > 5 sid=sid (Scan; write to bid=100 temp T 1) sid=sid Reserves Sailors Reserves rating > 5 (Scan; write to temp T 2) Sailors The earlier we process selections, less tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages? 25

Algebraic Laws (Examples) • Commutative and Associative Laws – R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T – R S = S R, R (S T) = (R S) T • Laws involving selection – s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) – s C (R S) = s C (R) S • When C involves only attributes of R • Laws involving projections – PM(PN(R)) = PM, N(R) 26

Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a, b, b, c} {a, b, b, b, e, f, f}={a, a, b, b, b, c, e, f, f} • {a, b, b, b, c, c} – {b, c, c, c, d} = {a, b, b, d} • s. C(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets ! 27

Finally: RA has Limitations ! • Cannot compute “transitive closure” Name 1 Name 2 Relationship Fred Mary Father Mary Joe Cousin Mary Bill Spouse Nancy Lou Sister • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write C program 28

Formulating queries in RA • Consider a database for student enrollment for courses, and books used in the courses – – – STUDENT (SSN, Name, Major, Bdate) COURSE (Course#, Cname, Dept) ENROLL (SSN, Course#, Quarter, Grade) BOOK_ADOPTION (Course#, Quarter, Book_ISBN) TEXT (Book_ISBN, Book_Title, Publisher, Author) 29

Formulating queries in RA • Specify the following queries in relational algebra – List the number of courses (Course#) taken by all students named ‘John Smith’ in Winter 1999 (i. e. , Quarter = W 99) – List any department which has all its adopted books published by ‘BC Publishing’ 30

Formulating Queries in RA § PCourse# (s Quarter=W 99 ((s Name= ‘John Smith’ (STUDENT) ⋈ ENROLL)) § Other. Dept = PDept ((s Publisher <> ‘PS Publishers’ (BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE) All. Dept = PDept (BOOK_ADOPTION ⋈ COURSE) WHY? Answer = All. Dept - Other. Dept § And how will you express it in SQL? 31

- Relational Algebra Lecture 4 Relational Algebra Relational Algebra
- Relational Algebra Relational Algebra Relational algebra was defined
- Relational Algebra Relational Algebra Relational algebra was defined
- Relational Algebra Relational Calculus Relational Algebra Operators n
- Relational Algebra Chapter 07 Relational Algebra Calculus Relational