All about Joins Tony Hasler UKOUG TECH 14

All about Joins Tony Hasler UKOUG TECH 14 December 2014

L Win a copy of the book - SQL QUIZ http: //tonyhasler. wordpress. com

Agenda �What is a join? �ANSI join syntax and outer joins �Left lateral joins �Join methods �Hash join input swapping

Warning and disclaimer! Some statements made in the forthcoming slides are not true when hash join input swapping is considered! We will consider hash join input swapping later in the presentation

What is a join?

What is a join in SQL? �A join is an operation on an ordered pair of row sources �I will refer to the first element of the pair as the driving row source and the second element the probed row source �Row sources can be elements in the FROM clause of an SQL query block − Tables − Data dictionary views (unmerged) − Inline views (unmerged or created by CBO transformations) − Factored subqueries − Results from the TABLE and XMLTABLE “operators” �Or unnested subqueries ( EXISTS, NOT EXISTS, IN, NOT IN, etc. ) �Or intermediate results generated from other joins

Tables to work with

The simplest possible join �SELECT * FROM T 1, T 2; �If there are M rows in T 1 and N rows in T 2 there are M x N rows in the result set. �In this case the result set will have 5 x 5 = 25 rows

A more complex case SELECT * FROM T 1, T 2, T 3, T 4 WHERE T 1. C 1 > 1 AND T 1. C 1=T 2. C 2 AND T 2. C 2 = T 3. C 3 AND T 3. C 3 > T 4. C 4; �The number of joins is always one less than the number of row sources (in this case 3 joins for 4 row sources) �One possible join tree can be depicted as: (((T 1 T 2) T 3) T 4)

Possible join trees �There are 120 possible join trees for four tables �The CBO will only consider the 24 join trees where the driving row source of the second and subsequent joins is the intermediate result from previous joins. − The CBO may consider (((T 3 T 4) T 1) T 2) − The CBO will not consider ((T 3 T 4) (T 1 T 2)) �Using the restricted set of join trees considered by the CBO we can consider the join order as fully specifying the join tree.

The original vision for joins in the SQL language �SQL is a declarative language. You specify what you want to do not how to do it. �Join order is not a programmers concern �All predicates are equal under Codd! − Logically, all predicates in the WHERE clause of a query block can be evaluated after all the joins have completed. �The comma-separated syntax of the FROM clause and the separation of the FROM and WHERE clauses reflects that vision.

ANSI join syntax and outer joins

ANSI join syntax SELECT * FROM T 1 JOIN T 2 ON T 1. C 1 = T 2. C 2 CROSS JOIN T 3 JOIN T 4 ON T 3. C 3 > T 4. C 4 WHERE T 1. C 1 > T 4. C 3 �ANSI syntax includes join predicates in the FROM clause and selection predicates in the WHERE clause. �A join order is explicitly specified �In the above SQL statement, the CBO will ignore the distinction between predicates and ignore the specified join order! �Heresy?

Outer Joins �An outer join has a preserved row source and an optional row source �Any rows in the preserved rows source that do not match rows in the optional row source are included in the join results �These extra rows have no value (NULL) for columns from the optional row source �Implications − The operands of an outer join are semantically different − The preserved row source must precede the optional row source in the join order. − There is now a distinction between join predicates and selection predicates

SELECT * FROM t 1 LEFT JOIN t 2 ON t 1. c 1 = t 2. c 2 LEFT JOIN t 3 ON t 1. c 1 = t 3. c 3 WHERE t 1. c 1 != 3 ORDER BY t 1. c 1; C 1 C 2 C 3 1 2 2 4 4 4 5 5 5

SELECT * FROM t 1 LEFT JOIN t 2 ON t 1. c 1 = t 2. c 2 LEFT JOIN t 3 ON t 1. c 1 = t 3. c 3 and t 1. c 1 != 3 ORDER BY t 1. c 1; C 1 C 2 C 3 1 2 2 3 3 4 4 4 5 5 5 � Only legal join orders are ((T 1 T 2) T 3) and ((T 1 T 3) T 2)

SELECT * FROM t 1 LEFT JOIN (t 2 LEFT JOIN t 3 ON t 2. c 2 = t 3. c 3) ON t 1. c 1 = t 2. c 2 ORDER BY t 1. c 1; C 1 C 2 C 3 1 2 2 3 3 3 4 4 4 5 5 5

SELECT c 1, c 2, c 3 FROM t 2 LEFT JOIN t 3 ON t 2. c 2 = t 3. c 3 RIGHT JOIN t 1 ON t 1. c 1 = t 2. c 2 ORDER BY t 1. c 1; C 1 C 2 C 3 1 2 2 3 3 3 4 4 4 5 5 5 � Theoretically no legal join order! The CBO has to transform this query.

WITH q 1 AS (SELECT * FROM t 2 LEFT JOIN t 3 ON t 2. c 2 = t 3. c 3) SELECT * FROM t 1 LEFT JOIN q 1 ON t 1. c 1 = q 1. c 2 ORDER BY t 1. c 1; -------------------| Id | Operation | Name | -------------------| 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | |* 2 | HASH JOIN OUTER | | | 3 | TABLE ACCESS FULL | T 1 | | 4 | VIEW | | |* 5 | HASH JOIN OUTER | | | 6 | TABLE ACCESS FULL| T 2 | | 7 | TABLE ACCESS FULL| T 3 | -------------------

Outer joins with extensions to traditional syntax SELECT * FROM t 1, t 2, t 3 WHERE t 1. c 1 = t 2. c 2(+) AND t 3. c 3 = t 2. c 2(+) ORDER BY t 1. c 1; SELECT * FROM t 1 LEFT JOIN t 2 ON t 1. c 1 = t 2. c 2 RIGHT JOIN t 3 ON t 3. c 3 = t 2. c 2; SELECT * FROM t 1 CROSS JOIN t 3 LEFT JOIN t 2 ON t 1. c 1 = t 2. c 2 AND t 2. c 2 = t 3. c 3;

Lateral Joins

Lateral joins with the TABLE operator SELECT p. * FROM v$session s, TABLE (DBMS_XPLAN. display_cursor (s. sql_id, s. sql_child_number)) p WHERE sid in (123, 456); LEGAL ENTITY, department or author (Click Insert | Header & Footer) Month Day, Year

Lateral joins in 12 c SELECT * FROM t 1 , LATERAL ( SELECT t 2. c 2, MEDIAN (t 2. c 2) OVER () med FROM t 2 WHERE t 2. c 2 BETWEEN t 1. c 1 - 3 AND t 1. c 1) v WHERE t 1. c 1 = v. c 2 ORDER BY t 1. c 1; �In ANSI syntax use the keywords CROSS APPLY (for inner lateral joins) and OUTER APPLY (for outer lateral joins)

Other types of join �Full outer joins (two preserved row sources) �Partitioned outer joins (potentially multiple rows in the result set from one preserved row). �Anti joins (standard and null aware in 11 g onwards) �Semi joins (standard and null accepting in 12 c onwards) �NOTE: A Partial Join is a 12 c optimizer transformation not a special type of join

Join methods

�There are 3½ join methods �Nested loops join �Hash join �Merge join Cartesian (a variation on a merge join) Variations on a theme: �Full and partial partition-wise joins �Buffered joins for parallel queries �Bloom filtering can be applied, primarily in parallel queries �Nested loops can be pre-fetched or batched

Nested loops join For every row in the driving row source, find corresponding rows in the probed row source. SELECT /*+ leading(t 1) use_nl(t 2) */ * FROM t 1, t 2 -----------------WHERE t 1. c 1 = t 2. c 2; | Id | Operation | Name | -----------------| 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T 1 | |* 3 | TABLE ACCESS FULL| T 2 | ------------------

Nested loops join with a hash cluster (1) CREATE CLUSTER cluster_hash ( ck ) HASHKEYS 3 HASH IS ck; CREATE TABLE tch 1 ( ck INTEGER , c 1 INTEGER ) CLUSTER cluster_hash ( ck ); INTEGER

Nested loops join with a hash cluster (2) SELECT /*+ leading(t 1) use_nl(tch 1) */ * FROM t 1, tch 1 WHERE t 1. c 1 = tch 1. ck; -----------------| Id | Operation | Name | -----------------| 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T 1 | |* 3 | TABLE ACCESS HASH| TCH 1 | ------------------

Nested loops join with an index CREATE INDEX t 2_i 1 ON t 2 (c 2); SELECT /*+ leading(t 1) use_nl_with_index(t 2 (c 2)) */ * FROM t 1, t 2 WHERE t 1. c 1 = t 2. c 2; ------------------| Id | Operation | Name | ------------------| 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T 1 | |* 3 | INDEX RANGE SCAN | T 2_I 1 | ------------------

Pros and cons of nested loops joins �Advantages − Assuming a suitable index is available, nested loops joins scale linearly − Only blocks containing rows to be selected are accessed − The only join method that support lateral joins − Supports theta joins (e. g. t 1. c 1 > t 2. c 2) − No in-memory workarea �Disadvantages − Usually works poorly without an index and indexes are expensive! − Indexes access a table with single block reads − Index range scans may access the same block multiple times �Interview sound bite: Used with two small tables

Hash join Create an in-memory hash cluster from the contents of the driving row source. Use an upside-down nested loops join from the probe row source into the in-memory hash cluster. SELECT /*+ leading(t 1) use_hash(t 2) no_swap_join_inputs(t 2) */ * FROM t 1, t 2 ------------------WHERE t 1. c 1 = t 2. c 2; | Id | Operation | Name | ------------------| 0 | SELECT STATEMENT | | |* 1 | HASH JOIN | | | 2 | TABLE ACCESS FULL| T 1 | | 3 | INDEX FULL SCAN | T 2_I 1 | ------------------

Pros and cons of hash joins �Advantages − No index on the join column is needed! − When full table scans are used blocks are only visited once (excluding visits to the workarea) − When full table scans are used multi-block reads are possible − More tolerant of cardinality errors that nested loops joins − Supports hash join input swapping �Disadvantages − Requires a workarea that limits scalability − Only equijoins are supported (e. g. t 1. c 1 = t 2. c 2) − May visit blocks in the probed row source that contain no rows in the result set. �Interview sound bite: used when joining a small table with a large table

Hash join input swapping

Hash join input swapping Exchange the two operands of the hash join. According to the hints the join order is not affected by the swap! SELECT /*+ leading(t 2 t 1) use_hash(t 1) swap_join_inputs(t 1) */ * FROM t 1, t 2 WHERE t 1. c 1 = t 2. c 2; ------------------| Id | Operation | Name | ------------------| 0 | SELECT STATEMENT | | |* 1 | HASH JOIN | | | 2 | TABLE ACCESS FULL| T 1 | | 3 | INDEX FULL SCAN | T 2_I 1 | ------------------

The three benefits of hash join input swapping �The intermediate result of a join operation can be the probed row source of a hash join. �The optional row source in an outer join can be the driving row source of a hash join. �The subquery in a semi-join or an anti-join can be the driving row source of a hash join. �However, hash join input swapping does not allow the intermediate result set from one join to be the optional row source in a later outer join. �Furthermore, hash join input swapping doesn’t facilitate bushy joins like the one shown earlier: ((T 3 T 4) (T 1 T 2))

SELECT /*+ leading(t 3 t 4 t 1 t 2) use_hash(t 4) use_hash(t 1) use_hash(t 2) swap_join_inputs(t 4) swap_join_inputs(t 1) swap_join_inputs(t 2) */ * FROM t 1 LEFT JOIN t 2 ON t 1. c 1 = t 2. c 2 JOIN t 3 ON t 1. c 1 = t 3. c 3 JOIN t 4 ON t 3. c 3 = t 4. c 4; -------------------| Id | Operation | Name | -------------------| 0 | SELECT STATEMENT | | |* 1 | HASH JOIN RIGHT OUTER| | | 2 | INDEX FULL SCAN | T 2_I 1 | ((T 2 |* 3 | HASH JOIN | | | 4 | TABLE ACCESS FULL | T 1 | T 3))) |* 5 | HASH JOIN | | | 6 | TABLE ACCESS FULL | T 4 | | 7 | TABLE ACCESS FULL | T 3 | -------------------- (T 1 (T 4

Questions? http: //tonyhasler. wordpress. com