1 Tampa Bay Relational Users Group Query diagnosis
1 Tampa Bay Relational Users Group Query diagnosis IBM Silicon Valley Lab, U. S. A. DB 2 for z/OS Query Optimization Ó IBM Corporation 2003 Ó IBM Corporation 2010
2 Query analysis and tuning • Format the SQL statement – Prepare the statement for human tuning • Separate sections for: – – SELECT list FROM clause WHERE clause … • Tools support – Data Studio fixpack 2. 2. 0. 1 includes SQL formatting • Show transformed SQL text DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
3 Sample unformatted query EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM_NBR AS ITEM_NBR, ITEM. PRDT_ID, STOREITEM. WK_STRT_DT AS WK_STRT_DT , STOREITEM. DC_ID AS DC_ID FROM PROD. TIPA 004_STITM_PROJ AS STOREITEM , PROD. TITM 001_ITEM AS ITEM WHERE ITEM. BUS_UNIT_ID = ‘GS‘ AND ITEM. BUS_UNIT_ID = STOREITEM. BUS_UNIT_ID AND ITEM. MJR_CATG_ID = '00754‘ AND ITEM. INTMD_CATG_ID = '00043‘ AND ITEM_NBR = STOREITEM_NBR AND ITEM. MJR_CATG_ID = STOREITEM. MJR_CATG_ID AND ITEM. INTMD_CATG_ID = STOREITEM. INTMD_CATG_ID AND STOREITEM. RTL_DEPT_NBR = 1 AND AD_ITEM_FLG = 'Y‘ AND WK_STRT_DT = '2002 -02 -08'; Unformatted SQL, where to start? DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
4 Formatted EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM_NBR AS ITEM_NBR, ITEM. PRDT_ID, STOREITEM. WK_STRT_DT AS WK_STRT_DT , STOREITEM. DC_ID AS DC_ID FROM PROD. TIPA 004_STITM_PROJ AS STOREITEM , PROD. TITM 001_ITEM AS ITEM WHERE ITEM. BUS_UNIT_ID = STOREITEM. BUS_UNIT_ID AND ITEM. MJR_CATG_ID = STOREITEM. MJR_CATG_ID AND ITEM. INTMD_CATG_ID = STOREITEM. INTMD_CATG_ID AND ITEM_NBR = STOREITEM_NBR AND ITEM. BUS_UNIT_ID = ‘GS‘ AND ITEM. MJR_CATG_ID = '00754‘ AND ITEM. INTMD_CATG_ID = '00043‘ AND STOREITEM. AD_ITEM_FLG = 'Y‘ AND STOREITEM. RTL_DEPT_NBR = 1 AND STOREITEM. WK_STRT_DT = '2002 -02 -08'; DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
5 Analyzing query • Observe “interesting predicates” – Optimizer may produce inaccurate filter factor estimate – Range predicates with parameter markers – Predicates using interesting literals • Probable defaults – Complex predicates • • Complex OR expressions Negation predicates Column expressions Non-column expressions DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
6 Sample query Pat’s diagnosis DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
7 Query breakdown SELECT … FROM SETL_TRANS S , BRANCH CUST , BRANCH_ADDR A WHERE S. ADV_ABA_R = ? AND S. PROCESS_DT < '9999 -12 -31‘ AND S. TYPE_CD IN ('A', ‘C’, ‘X’) AND S. CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S. STLMT_DT = ? AND S. ACCT_NUM = CUST. ACCT_NUM AND CUST_EFCT_DT <= ? AND CUST_INACTV_DT > ? AND A. ACCT_NUM = CUST. ACCT_NUM AND A. CUST_EFCT_DT <= ? AND A. CUST_INACTV_DT > ? AND A. ADDR_TYP_CD = ' ' DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
8 Identify peculiar predicates SELECT … FROM SETL_TRANS S , BRANCH CUST , BRANCH_ADDR A WHERE S. ADV_ABA_R = ? AND S. PROCESS_DT < ‘ 9999 -12 -31’ AND S. TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) AND S. CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S. STLMT_DT = ? AND S. ACCT_NUM = CUST. ACCT_NUM AND CUST_EFCT_DT <= ? AND CUST_INACTV_DT > ? AND A. ACCT_NUM = CUST. ACCT_NUM AND A. CUST_EFCT_DT <= ? AND A. CUST_INACTV_DT > ? AND A. ADDR_TYP_CD = ' ‘ DB 2 for z/OS Query Optimization MAX DATE Range with marker COL = blank Ó IBM Corporation 2010
9 Why are they peculiar? Predicates with typical default often skewed. AND S. PROCESS_DT < ‘ 9999 -12 -31’ A. ADDR_TYP_CD = ' ‘ MAX DATE COL = blank Range predicates with parameter markers - Impossible to estimate without literal AND CUST_EFCT_DT <= ? CUST_INACTV_DT > ? Range with marker AND A. CUST_EFCT_DT <= ? A. CUST_INACTV_DT > ? Range with marker DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
10 Range predicate interpolation Table 104. Default filter factors for interpolation COLCARDF Filter factor for OP Filter Factor for LIKE / BETWEEN >= 100, 000 1 / 10, 000 3 / 100, 000 >= 10, 000 1 / 3, 000 1 / 10, 000 >= 1, 000 1 / 1, 000 3 / 10, 000 >= 100, 000 1 / 300 1 / 1, 000 >= 10, 000 1 / 100 3 / 1, 000 >= 1, 000 1 / 30 1 / 100 >= 100 1 / 10 3 / 100 >= 2 1/3 1 / 10 1 1 1/3 1 / 10 =1 >= 0 Note: Op is one of these operators: <, <=, >, >=. COMMENT: This is DB 2’s documented guess for an impossible to estimate Filter factor. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
11 Analyzing query • Embed information within statement – Table information • CARDF • NPAGES – Column information for predicates • Local predicates • Join predicates – Observe where the filtering is • Selectivity of a predicate is relative to table cardinality • Investigate “suspicious” predicates – Determine actual versus estimated filtering – If there is a problem, identify options DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
12 Embed statistics SELECT … FROM SETL_TRANS S CARDF 1, 600, 254 NPAGES 21, 627 , BRANCH CUST CARDF 31, 696 NPAGES 1132 , BRANCH_ADDR A CARDF 58, 627 NPAGES 2791 WHERE S. ADV_ABA_R = ? COLCARDF 19, 712 AND S. PROCESS_DT < ‘ 9999 -12 -31’ COLCARDF 11 LOW 2 KEY 2004 -03 -24 HIGH 2 KEY 2004 -04 -05 AND S. TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) COLCARDF 4 AND S. CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S. STLMT_DT = ? COLCARDF 13 AND S. ACCT_NUM = CUST. ACCT_NUM COLCARDF 15360 / 26, 527 AND CUST_EFCT_DT <= ? COLCARDF 2, 496 LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 AND CUST_INACTV_DT > ? COLCARDF 279 LOW 2 KEY 2004 -03 -04 HIGH 2 KEY 2004 -04 -07 AND A. ACCT_NUM = CUST. ACCT_NUM COLCARDF 26, 527 / 26, 527 AND A. CUST_EFCT_DT <= ? COLCARDF 2, 496 LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 AND A. CUST_INACTV_DT > ? COLCARDF 274 LOW 2 KEY ‘ 2004 -03 -04’ HIGH 2 KEY ‘ 2004 -04 -07’ AND A. ADDR_TYP_CD = ‘ ‘ COLCARDF 5 DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Suspicious predicate analysis 13 1) The first range predicate, we’re looking for all values less than ‘ 9999 -12 -31. So the predicate searches for all values less than a number significantly greater Than the HIGH 2 KEY – so basically, all of the rows qualify here. (since the optimizer has the literal value, it KNOWS that all rows qualify). 2) For the column = blank predicate, I don’t believe a skew search was ever done. You could look to see how many values are blank. Is it > 20%? 1/5 = 20%. 1) AND S. PROCESS_DT < '9999 -12 -31‘ COLCARDF 11 LOW 2 KEY 2004 -03 -24 HIGH 2 KEY 2004 -04 -05 2) AND A. ADDR_TYP_CD = ' ‘ COLCARDF 5 Conclusion: First predicate is should not be causing this SQL statement any Problems. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Suspicious predicate analysis 14 The literal value used for each of the parameter markers in this case happened To be the same, and the value was 2004 -04 -06. Comparing the literal value to the HIGH 2 KEY and what range would qualify Is how I determined the ESTIMATED FF WITH LITERAL. The ESTIMATED FF WITH MARKER is from the chart in the Admin guide. The “error” is how different the optimizers DEFAULT estimate is from ACTUAL filtering. 3) AND CUST_EFCT_DT <= ? LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 ESTIMATED FF WITH LITERAL: = 100% ESTIMATE WITH MARKER: 1/30 = 3% 4) AND CUST_INACTV_DT > ? LOW 2 KEY 2004 -03 -04 HIGH 2 KEY 2004 -04 -07 ESTIMATED FF WITH LITERAL: = 99% ESTIMATE WITH MARKER: 1/10 = 10% 5) AND A. CUST_EFCT_DT <= ? LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 ESTIMATED FF WITH LITERAL: = 100% ESTIMATE WITH MARKER: 1/30 = 3% 6) AND A. CUST_INACTV_DT > ? LOW 2 KEY ‘ 2004 -03 -04’ HIGH 2 KEY ‘ 2004 -04 -07’ ESTIMATED FF WITH LITERAL: = 99% ESTIMATE WITH MARKER: 1/10 = 10% DB 2 for z/OS Query Optimization COLCARDF 2, 496 ( 97% error ) COLCARDF 279 ( 89% error ) COLCARDF 2, 496 ( 97% error ) COLCARDF 274 ( 89% error ) Ó IBM Corporation 2010
15 Suspicious predicate analysis • Conclusion – The range predicates with parameter markers introduce significant filter factor error. So we should recognize that this filter factor error can cause significant cost estimation problems for the optimizer – possibly resulting in poor access path choice. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
16 Where’s the filtering? WHERE S. ADV_ABA_R = ? COLCARDF 19, 712 (Very selective predicate) AND S. PROCESS_DT < ‘ 9999 -12 -31’ COLCARDF 11 (This predicate doesn’t filter anything, known from suspicious predicate analysis) AND S. TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 (In-list looking for 4 values, COLCARDF 4 – not filtering) AND S. CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 (In-list looking for 3 values, COLCARDF 3 – not filtering) AND S. STLMT_DT = ? COLCARDF 13 (COL = LIT, COLCARDF 13 – somewhat filtering, but not great selectivity) AND S. ACCT_NUM = CUST. ACCT_NUM COLCARDF 15360 / 26, 527 (For the range predicates, we know that optimizer PERCIEVES them to be selective but In reality, they are not. This was determined during suspicious predicate analysis) AND CUST_EFCT_DT <= ? COLCARDF 2, 496 AND CUST_INACTV_DT > ? COLCARDF 279 AND A. ACCT_NUM = CUST. ACCT_NUM COLCARDF 26, 527 / 26, 527 AND A. CUST_EFCT_DT <= ? COLCARDF 2, 496 AND A. CUST_INACTV_DT > ? COLCARDF 274 AND A. ADDR_TYP_CD = ‘ ‘ COLCARDF 5 (COL = blank. Probably this column is skewed on blank. COLCARDF 5, not typically Very filtering) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
17 Where’s the filtering? eb tiv ec el ts os M SELECT … FROM SETL_TRANS S CARDF 1, 600, 254 NPAGES 21, 627 , BRANCH CUST CARDF 31, 696 NPAGES 1132 , BRANCH_ADDR A CARDF 58, 627 NPAGES 2791 WHERE S. ADV_ABA_R = ? COLCARDF 19, 712 AND S. PROCESS_DT < ‘ 9999 -12 -31’ COLCARDF 11 LOW 2 KEY 2004 -03 -24 HIGH 2 KEY 2004 -04 -05 AND S. TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 AND S. CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S. STLMT_DT = ? COLCARDF 13 AND S. ACCT_NUM = CUST. ACCT_NUM COLCARDF 15360 / 26, 527 AND CUST_EFCT_DT <= ? COLCARDF 2, 496 LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 AND CUST_INACTV_DT > ? COLCARDF 279 LOW 2 KEY 2004 -03 -04 HIGH 2 KEY 2004 -04 -07 AND A. ACCT_NUM = CUST. ACCT_NUM COLCARDF 26, 527 / 26, 527 AND A. CUST_EFCT_DT <= ? COLCARDF 2, 496 LOW 2 KEY 1994 -09 -02 HIGH 2 KEY 2004 -04 -06 AND A. CUST_INACTV_DT > ? COLCARDF 274 LOW 2 KEY ‘ 2004 -03 -04’ HIGH 2 KEY ‘ 2004 -04 -07’ AND A. ADDR_TYP_CD = ‘ ‘ COLCARDF 5 y r fa DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
18 Index analysis • One significant input to the optimizer is… – Available indexes – What join sequence they encourage • Some index performance considerations – Provide efficient access for local predicates • Encourages table to be outer table – Provide efficient access for join predicates • Encourage access to table as INNER table of join – Provide ordering to avoid sort • Analysis: – Are there appropriate indexes to support this query? DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
19 Identify indexes Table: SETL_TRANS INDEX IXSTRN 01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC 01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC 02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD 01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD 02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
20 Index candidate usage Table: AJT_SETL_TRANS INDEX IXSTRN 01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC 01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX IXBRNC 02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD 01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD 02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
21 Index design analysis (by table) • BRANCH table (Index design OK!) – Index IXBRNC 02 supports local access • CONCERN: Predicate on this column has filter factor grossly overestimated, so optimizer will perceive the access to be more efficient to this table than what really occurs! – Index IXBRNC 01 supports join access • BRANCH_ADDR table (Index design OK!) – Index IXBRAD 01 leading column on local filtering • Predicate on this column has filter factor grossly over estimated • Allows table to be considered as inner table efficiently – Index IXBRAD 02 leading column supports join • Allows table to be an efficient inner table DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Index design analysis (by table) 22 • SETL_TRANS table (Not OK!) – INDEX IXSTRN 01 has one index. • No efficient for join – join predicate needs to be leading col) • No efficient index for outer access – Leading column of index qualifies ALL rows DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
23 Overlay table size Table: SETL_TRANS CARDF 1, 600, 254 NPAGES 21, 627 INDEX IXSTRN 01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) CARDF 58, 627 x de in st n or sca , w st s! le u ow ab M r t t s. on es on lli gg pti mi O 1. 6 TABLE: BRANCH_ADDR Bi TABLE: BRANCH CARDF 31, 696 NPAGES 1132 INDEX: IXBRNC 02 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC 01 (ACCT_NUM, CUST_EFCT_DT) NPAGES 2791 INDEX: IXBRAD 01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD 02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
24 Possible new indexes • Existing index IXSTRN 01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) • Efficient outer table access INDEX opt_1 (ADV_ABA_R, STLMT_DT, ACCT_NUM) • Efficient inner table access: INDEX opt_2 (ACCT_NUM) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
25 Summary of this SQL • Indexes on BRANCH, BRANCH_ADDR look better than they are – Range predicate with parameter marker estimates 3% of rows qualify – In reality, 99% qualify • Inefficient index available on SETL_TRANS table – No efficient outer table index available – No efficient inner table index available – This is the biggest table, with the best filter!!! • Optimizer bad join method due to combination of above factors – Performed full scan of transaction index 26, 000 times • Resolution: – Providing new index on SETL_TRANS should provide more stable, faster access than ever before – REOPT, or providing literal values avoids the disaster without new index DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
26 SQL 2 SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? AND A. PART_TYPE IN ( 'F', 'I', 'P' ) AND B. PART_NUM = ? AND B. SUB_CONTRACTOR = 'Y' CARDF=17, 598 QUALIFIED_ROWS=67. 1 CARDF=34, 728 QUALIFIED_ROWS=77. 8 CARDF=2, 093, 750 QUALIFIED_ROWS=38, 382 CARDF=7, 058, 356 QUALIFIED_ROWS=7, 058, 356 CARDF=21, 366, 326 QUALIFIED_ROWS=21, 366, 320 COLCARDF=208 MAX_FREQ=36. 408% FF=0. 005 COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 DB 2 for z/OS Query Optimization NPAGESF=1, 467 NPAGESF=724 NPAGESF=52, 189 NPAGESF=68, 644 NPAGESF=1, 236, 490 Ó IBM Corporation 2010
Local predicate analysis SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? CARDF=17, 598 QUALIFIED_ROWS=67. 1 CARDF=34, 728 QUALIFIED_ROWS=77. 8 CARDF=2, 093, 750 QUALIFIED_ROWS=38, 382 CARDF=7, 058, 356 QUALIFIED_ROWS=7, 058, 356 CARDF=21, 366, 326 QUALIFIED_ROWS=21, 366, 320 COLCARDF=208 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 27 NPAGESF=1, 467 NPAGESF=724 NPAGESF=52, 189 NPAGESF=68, 644 NPAGESF=1, 236, 490 ? ? ? Skewed, not selective skewed, selective • Both ‘A’ and ‘B’ tables have selective predicates. • COUNTRY_CD and PART_CD predicates – there is skew, optimizer assumes uniform distribution • B. PART_NUM – Slightly skewed. 3% one value. Uniform estimate is 0. 4%. • PREFERRED – skewed, query searches for an infrequently occurring value. • Without looking at indexes, seems ‘A’ and ‘B’ will compete to be outer table • Qualified rows of 67. 1 and 77. 8 pretty close DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
28 Local index analysis – ‘A’ SELECT COLS FROM PART A WHERE A. COUNTRY_CD = ? CARDF=17, 598 COLCARDF=208 AND A. PART_CD = ? COLCARDF=5 AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 MAX_FREQ=36. 408% FF=0. 005 ? ? ? ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% MAX_FREQ=47. 199% FF=0. 2 ? ? ? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% MAX_FREQ=79. 867% FF=0. 958 Skewed, not selective INDEX IXPRT 01 CLU Y UR NLEAF U 151 NLEVEL 3 CR 0. 999 IXPRT 02 N D 128 2 0. 794 IXPRT 03 N D 26 2 0. 998 IXPRT 04 N D 99 2 0. 782 DB 2 for z/OS Query Optimization KEYCOLNAME PART_CD COUNTRY_CD FILE DR SECTOR PDV PART_CD PART_TYPE PDV FILE PART_TYPE PART_CD COUNTRY_CD PART_TYPE PART_NUM COLCARDF 5 208 2496 46 178 16830 5 8 16830 2496 8 5 208 8 17598 MCARDF 5 251 3054 3176 3548 17598 5 28 16850 16905 8 28 579 8 17598 Ó IBM Corporation 2010
29 Local index analysis – ‘A’ SELECT COLS FROM PART A WHERE A. COUNTRY_CD = ? CARDF=17, 598 COLCARDF=208 AND A. PART_CD = ? COLCARDF=5 AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 MAX_FREQ=36. 408% FF=0. 005 ? ? ? ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% MAX_FREQ=47. 199% FF=0. 2 ? ? ? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% MAX_FREQ=79. 867% FF=0. 958 Skewed, not selective INDEX IXPRT 01 CLU Y UR NLEAF U 151 NLEVEL 3 CR 0. 999 KEYCOLNAME PART_CD COUNTRY_CD FILE DR SECTOR PDV COLCARDF 5 208 2496 46 178 16830 MCARDF 5 251 3054 3176 3548 17598 IXPRT 02 N D 128 2 0. 794 PART_CD PART_TYPE PDV FILE 5 8 16830 2496 5 28 16850 16905 IXPRT 03 N D 26 2 0. 998 PART_TYPE PART_CD COUNTRY_CD 8 5 208 8 28 579 IXPRT 04 N D 99 2 0. 782 PART_TYPE PART_NUM 8 17598 DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
30 Local index analysis B SELECT COLS FROM CONTRACTOR B WHERE B. PART_NUM = ? AND B. SUB_CONTRACTOR = 'Y' CARDF=34, 728 COLCARDF=260 COLCARDF=2 LOW 2 KEY=N AND B. SUSPENDED = 'N' QUALIFIED_ROWS=77. 8 MAX_FREQ=3. 032% FF=0. 004 MAX_FREQ=87. 402% FF=0. 126 HIGH 2 KEY=Y MAX_FREQ=99. 833% FF=0. 998 HIGH 2 KEY=Y NPAGESF=724 skewed, selective skewed, not selective INDEX IXCTR 01 CLU Y UR NLEAF P 210 NLEVEL 3 CR 0. 962 KEYCOLNAME PART_NUM CONTRACTOR_ID CONT_TYPE COLCARDF 260 1047 7 MCARDF 278 34722 34728 IXCTR 02 N D 50 2 0. 624 PART_NUM 260 278 IXCTR 03 N D 56 2 0. 348 BEGIN_DT CONTRACTOR_ID 1015 1047 1015 2555 IXCTR 04 N D 316 3 0. 927 CONTRACTOR_ID PART_NUM BEGIN_DT END_DT CONT_TYPE 1047 260 1015 2656 7 1047 34722 34728 IXCTR 05 N D 250 3 0. 896 CONTRACTOR_ID BEGIN_DT PART_NUM 1047 1015 260 1047 2555 34722 DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
31 Local index analysis B SELECT COLS FROM CONTRACTOR B WHERE B. PART_NUM = ? AND B. SUB_CONTRACTOR = 'Y' AND B. SUSPENDED = 'N' CARDF=34, 728 COLCARDF=260 COLCARDF=2 LOW 2 KEY=N QUALIFIED_ROWS=77. 8 MAX_FREQ=3. 032% FF=0. 004 MAX_FREQ=87. 402% FF=0. 126 HIGH 2 KEY=Y MAX_FREQ=99. 833% FF=0. 998 HIGH 2 KEY=Y NPAGESF=724 skewed, selective skewed, not selective INDEX IXCTR 01 CLU Y UR NLEAF P 210 NLEVEL 3 CR 0. 962 KEYCOLNAME PART_NUM CONTRACTOR_ID CONT_TYPE COLCARDF 260 1047 7 MCARDF 278 34722 34728 IXCTR 02 N D 50 2 0. 624 PART_NUM 260 278 IXCTR 04 N D 316 3 0. 927 CONTRACTOR_ID PART_NUM BEGIN_DT END_DT CONT_TYPE 1047 260 1015 2656 7 1047 34722 34728 IXCTR 05 N D 250 3 0. 896 CONTRACTOR_ID BEGIN_DT PART_NUM 1047 1015 260 1047 2555 34722 • • • Note: SUB_CONTRACTOR is selective due to search for least frequent value. Is not in any candidate index. Otherwise, local index support looks good. May be able to drop IXCTR 02 with reverse index scan support. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
32 Local index analysis C SELECT COLS FROM CONT_PARTS C WHERE C. PREFERRED = 'Y' INDEX IXCPR 04 CLU N CARDF=2, 093, 750 COLCARDF=3 UR NLEAF D 15352 QUALIFIED_ROWS=38, 382 MAX_FREQ=76. 832% FF=0. 018 NLEVEL 3 CR 0. 998 KEYCOLNAME PREFERRED CONTRACTOR_ID PRODUCT_ID NPAGESF=52, 189 skewed, selective COLCARDF 3 316 1391650 MCARDF 3 552 1808887 • Table C – There is index support for local filtering. – Trailing join column (good) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Indexes for local summary • 33 Each table with local filtering had efficient indexes to support local filtering – Positives: • Efficient access paths exist. – Negatives: • Each table will compete for the outer • More “apparently efficient” choices, more stress on optimizer, opportunity for incorrect choice DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Join graph 34 B C E D A • Two most selective tables ‘A’ and ‘B’ not joined directly • C – D – E each join on same column (PRODUCT_ID) • Shaping up like ‘A’ with 67 outer rows as outer vs ‘B’ with 77 rows as outer DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
35 Join considerations • Index support for certain join sequences – Indexes available to support matching index access for different desirable join sequences? • Join reduction / fan-out considerations – Consider expansion / contraction of result size through different join sequences DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Join indexes A SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? 36 CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 INDEX IXPRT 01 IXPRT 02 CLU N N UR NLEAF P 83 D 112 NLEVEL 2 2 CR 0. 782 IXPRT 04 N D 99 2 0. 782 IXPRTxx N D 122 2 0. 782 DB 2 for z/OS Query Optimization KEYCOLNAME PART_NUM PART_TYPE PART_CD PART_TYPE PART_NUM PART_TYPE PART_CD COUNTRY_CD COLCARDF 17598 8 5 8 17598 8 5 208 MCARDF 17598 8 17598 17603 17598 -1 17603 Ó IBM Corporation 2010
37 Join indexes A • Join access available through join the ‘D’ table only – Via PART_NUM if ‘D’ is the outer • There are multiple indexes to support ‘A’ as inner – IXPRT 02 and IXPRTxx appear redundant – IXPRTxx is superset of IXPRT 02, same column sequence INDEX IXPRT 01 IXPRT 02 CLU N N UR NLEAF P 83 D 112 NLEVEL 2 2 CR 0. 782 IXPRT 04 N D 99 2 0. 782 IXPRTxx N D 122 2 0. 782 DB 2 for z/OS Query Optimization KEYCOLNAME PART_NUM PART_TYPE PART_CD PART_TYPE PART_NUM PART_TYPE PART_CD COUNTRY_CD COLCARDF 17598 8 5 8 17598 8 5 208 MCARDF 17598 8 17598 17603 17598 -1 17603 Ó IBM Corporation 2010
Join indexes B SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? 38 CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 INDEX IXCTR 01 CLU Y UR NLEAF P 210 NLEVEL 3 CR 0. 962 IXCTR 04 N D 316 3 0. 927 IXCTR 05 N D 250 3 0. 896 DB 2 for z/OS Query Optimization KEYCOLNAME PART_NUM CONTRACTOR_ID CONT_TYPE CONTRACTOR_ID PART_NUM BEGIN_DT END_DT CONT_TYPE CONTRACTOR_ID BEGIN_DT PART_NUM COLCARDF 260 1047 7 1047 260 1015 2656 7 1047 1015 260 MCARDF 278 34722 34728 1047 34722 34728 1047 2555 34722 Ó IBM Corporation 2010
39 Join indexes B • Join access available through join the ‘C’ table only – • Via CONTRACTOR_ID if ‘C’ is the outer There are multiple indexes to support ‘B’ as inner – IXCTR 01 has PART_NUM as leading local • • • – Join from outer will hit far fewer leaf pages due to leading local predicate Smaller “swath” of leaf pages: NLEAF * 1/PART_NUM COLCARDF 210 * (1/260) ~= 1 leaf page Makes this index “outstanding” from inner index access perspective Also an effective “outer” index since it provides good local filtering and join order for a join to ‘C’ table as inner IXCTR 04, IXCTR 05 lead with join predicate • • Support the join effectively Join scattered over all leaf pages INDEX IXCTR 01 CLU Y UR NLEAF P 210 NLEVEL 3 CR 0. 962 IXCTR 04 N D 316 3 0. 927 IXCTR 05 N D 250 3 0. 896 DB 2 for z/OS Query Optimization KEYCOLNAME PART_NUM CONTRACTOR_ID CE_TYPE CONTRACTOR_ID PART_NUM CE_DTDIFFREEL CE_DTLANCREEL CE_TYPE CONTRACTOR_ID CE_DTDIFFREEL PART_NUM COLCARDF 260 1047 7 1047 260 1015 2656 7 1047 1015 260 MCARDF 278 34722 34728 1047 34722 34728 1047 2555 34722 Ó IBM Corporation 2010
Join indexes C SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? 40 CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 INDEX IXCPR 01 CLU Y UR NLEAF U 21367 NLEVEL 3 CR 1. 0 IXCPR 02 N D 14771 3 0. 999 IXCPR 03 N D 16188 3 0. 998 IXCPR 04 N D 15352 3 0. 998 DB 2 for z/OS Query Optimization KEYCOLNAME PRODUCT_ID CONTRACTOR_ID CO_DTHRCONTACT CONTRACTOR_ID PRODUCT_ID CONTRACTOR_ID CO_PHASECONTACT PRODUCT_ID PREFERRED CONTRACTOR_ID PRODUCT_ID COLCARDF 1391650 316 1645213 316 1391650 316 4 1391650 3 316 1391650 MCARDF 1391650 1794093 2093750 316 1794093 316 783 1931232 3 552 1808887 Ó IBM Corporation 2010
41 Join indexes C • Join access available through join the ‘B’, ‘D’, and ‘E’ tables – Via CONTRACTOR_ID if ‘B’ is the outer composite – Via PRODUCT_ID if ‘D’ or ‘E’ are in the outer composite • There is support for either join sequence. – CPNQCC 02 has PRODUCT_ID as leading column to support ‘D’ or ‘E’ in outer composite – CPNQXC 02 and IXCPR 03 have CONTRACTOR_ID as leading join column if ‘B’ is the outer composite • IXCPR 03 would also be a candidate if B were cartesianed with D or E. Not that I think that’s likely. – CPMQXCOH would likely be preferred index if ‘B’ were in outer composite • Selective leading local on PREFERRED bounds the leaf pages that would be hit to < 2% of all leaf pages • Makes ‘C’ a possible efficient outer – good local filtering, provides join ordering for join to ‘B’ table INDEX IXCPR 01 CLU Y UR NLEAF U 21367 NLEVEL 3 CR 1. 0 IXCPR 02 N D 14771 3 0. 999 IXCPR 03 N D 16188 3 0. 998 IXCPR 04 N D 15352 3 0. 998 DB 2 for z/OS Query Optimization KEYCOLNAME PRODUCT_ID CONTRACTOR_ID CO_DTHRCONTACT CONTRACTOR_ID PRODUCT_ID CONTRACTOR_ID CO_PHASECONTACT PRODUCT_ID PREFERRED CONTRACTOR_ID PRODUCT_ID COLCARDF 1391650 316 1645213 316 1391650 316 4 1391650 3 316 1391650 MCARDF 1391650 1794093 2093750 316 1794093 316 783 1931232 3 552 1808887 Ó IBM Corporation 2010
42 Join indexes D SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 INDEX IXPDA 05 IXPDA 02 CLU Y N UR NLEAF P 44900 D 70586 NLEVEL 4 4 CR 0. 975 0. 868 IXPDA 06 N D 4 0. 975 DB 2 for z/OS Query Optimization 66590 KEYCOLNAME PRODUCT_ID PART_NUM COLCARDF 7058356 6132 MCARDF 7058356 6132 7058356 Ó IBM Corporation 2010
43 Join indexes D • ‘D’ is accessed in multiple directions – Via PART_NUM if ‘A’ is the outer – Via PRODUCT_ID if accessed through ‘C’ or ‘E’ • Both join direction supported by matching index access. – RT_ENTID leading column of IXPDA 02 – PRODUCT_ID leading column of IXPDA 05, IXPDA 06 • The non-primary key indexes are defined as allowing duplicates – but they cannot. – PRODUCT_ID is the primary key and is included in a unique index. – Any index which contains PRODUCT_ID therefore is unique. Defining as unique would save some space in the index. Duplicate indexes have slightly larger control structures to allow for duplicate RIDS. – DB 2 must allow for duplicates if the index is not explicitly defined as unique since you could drop the unique index. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? Join indexes E 44 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 INDEX IXPPA 01 • CLU N CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 UR NLEAF U 141499 NLEVEL 4 CR 0. 609 KEYCOLNAME PRODUCT_ID COLCARDF 21366326 MCARDF 21366326 Join access available through C and E tables – Both tables join on PRODUCT_ID column • Join is supported via IXPDA 01 index – PRODUCT_ID only column – Unique index (no fan-out when joining to this table) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
45 Join fan-out SELECT COLS FROM PART A , CONTRACTOR B , CONT_PARTS C , PARTS_PROD_ASMBLY D , PARTS_PROD_ASM_DTL E WHERE A. COUNTRY_CD = ? CARDF=17, 598 CARDF=34, 728 CARDF=2, 093, 750 CARDF=7, 058, 356 CARDF=21, 366, 326 COLCARDF=208 QUALIFIED_ROWS=67. 1 NPAGESF=1, 467 QUALIFIED_ROWS=77. 8 NPAGESF=724 QUALIFIED_ROWS=38, 382 NPAGESF=52, 189 QUALIFIED_ROWS=7, 058, 356 NPAGESF=68, 644 QUALIFIED_ROWS=21, 366, 320 NPAGESF=1, 236, 490 MAX_FREQ=36. 408% FF=0. 005 ‘FR’ = 36. 4% ‘GB’ = 17% ‘DE’=10% AND A. PART_CD = ? COLCARDF=5 MAX_FREQ=47. 199% FF=0. 2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A. PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79. 867% FF=0. 958 AND B. PART_NUM = ? COLCARDF=260 MAX_FREQ=3. 032% FF=0. 004 AND B. SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87. 402% FF=0. 126 LOW 2 KEY=N HIGH 2 KEY=Y AND B. SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99. 833% FF=0. 998 LOW 2 KEY=N HIGH 2 KEY=Y AND C. PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76. 832% FF=0. 018 AND B. CONTRACTOR_ID = C. CONTRACTOR_ID COLCARDF=1, 047/316 FF=9. 551 E-4 AND D. PART_NUM = A. PART_NUM COLCARDF=6, 132/17, 598 FF=5. 682 E-5 AND C. PRODUCT_ID = D. PRODUCT_ID COLCARDF=1, 391, 650/7, 058, 356 FF=1. 417 E-7 AND C. PRODUCT_ID = E. PRODUCT_ID COLCARDF=1, 391, 650/21, 366, 326 FF=4. 68 E-8 AND E. PRODUCT_ID = D. PRODUCT_ID COLCARDF=21, 366, 326/7, 058, 356 FF=4. 68 E-8 • Look at join fan-out issues • Qualified outer rows * (CARDF of inner / MAX(join colcardf) • A D • 67. 1 rows * (7, 058, 356 / 17598) ~= 27, 000 rows • B C or C B • 77. 8 rows * (2, 093, 750 / 1047) ~= 155, 500 rows (after local filtering on C, down to 38 K) • So B C expected to fan-out far more. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Explain PLANNO METHOD 1 MERGE _COLS 46 TB_NAME MATCH COLS ACCESS _TYPE ACCESS NAME IX_ONLY SORTN_ JOIN SORTC_ JOIN 0 PART 2 I IXPRT 01 N N N 2 1 PARTS_PROD_AS SEMBLY 1 I IXPPA 02 Y N N 3 2 CONT_PARTS 1 I IXPRD 04 N Y Y 4 1 CONTRACTOR 2 I IXCTR 01 N N N 5 1 PARTS_PROD_AS M_DTL 1 I IXPDA 01 N N N 1 • Join sequence • Access ‘A’ via index IXPRT 01 (PART_CD, COUNTRY_CD, …) • Nested loop join to ‘D’ using index IXPDA 02 (RV_ENT_ID, PRODUCT_ID) • Sort merge join to C • Sorting composite into PRODUCT_ID sequence • Access ‘C’ via IXCPR 04 (PREFERRED, CONTRACTOR_ID) • Sorting new into PRODUCT_ID sequence • Nested loop join to B via index IXCTR 01 • (PART_NUM, CONTRACTOR_ID, CE_TYPE) • Nested loop join to E via index IXPDA 01 • (PRODUCT_ID) ~67 rows ~27, 000 rows ~7, 900 rows • Blue = local predicate • Green = join predicate DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
47 Issues – A as outer? • • Is local filtering to ‘A’ table accurate? – There is skew, but use of markers precludes recognition of skew – Qualified rows and fan-out could be much worse than estimated – ‘A’ as outer could be underestimated, depends on what values being used Sort merge join to ‘C’ to avoid 27 K probes – Does not want to probe 27 k times matching + fan-out on PRODUCT_ID – Uses efficient local index instead • 1 probe to scan of 38 k rows via PREFERRED • 27 K probes * 2 rows per inner via index on PRODUCT_ID – Index on PREFERRED, PRODUCT_ID likely would might avert SMJ in this context – Hesitant to recommend index – since A D C could be an inefficient sequence. DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Issues – B / C as outer? • 48 B as outer – Less skew on B. PART_NUM = ? – less uncertainty in cost estimate – Fan-out to 38 K rows is discouraging – B C supported by efficient local + equals index • (PREFERRED, CONTRACTOR_ID, PRODUCT_ID) • C also a desirable outer – Index on (PREFERRED, CONTRACTOR_ID, PRODUCT_ID) provides good local filter – Could access B via local filtering on B. PART_NUM = ? , materialize 77 rows into workfile for sort merge join DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
49 Summary Query 2 • Bottom line: – Uniform distribution estimate on ‘A’ table allows it to compete very favorably. – If ‘FR’, ‘GB’, ‘DE’ values used for COUNTRY_CD – ‘A’ as outer no longer desirable. • Are ‘FR’, ‘GB’, ‘DE’ values frequently used for this query? – If PART_CD = ‘ 4’ value is used frequently – ‘A’ as outer no longer desirable. • Is ‘ 4’ used frequently? – Split query, REOPT, OPTHINTS… • Multiple choices – Local filtering spread across several tables – Estimated filtering looks good – Efficient access paths (index to support local, join predicates) exist – More difficult for optimizer to identify the cheapest path – Scenario more regression prone – Optimizer may need more statistics, ability to use more statistics (REOPT) for optimizer identify the cheapest path DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
Commentary • 50 How to perform SQL analysis – Format query so it’s readable – Annotate with important statistics • Tables: – Table cardinality, NPAGES, qualified number of rows • Predicates – COLCARDF, LOW 2 KEY, HIGH 2 KEY, filter factor estimate • Are table level estimates reasonable based on your knowledge? – If you don’t know – perform counts to find out if estimates are accurate – If you don’t know how selective things are, how will you know what the best path should be? • Are predicate level filtering estimates reasonable? – Reference table, indexed columns report • Is the best local filtering supported through matching index access? • Any mis-estimated local filtering that’s also matching indexable (may cause one path to look far more efficient than reality) • With trailing join predicates to provide order to next desired table (bonus) • Is there adequate (matching) index support for desired join sequences? – Develop understanding of “plausible” and “desirable” access paths – Examine EXPLAIN output • Does optimizer choose the path you expect? • If not, you should have better understanding of what makes other access paths competitive, tuning can be more targeted – Eg. Certain predicate appears filtering, but is not. – Can use REOPT, or trick – targeted to solve a specific problem. • Skilled targeted tuning is less susceptible to re-regress than blind tuning (where problem is not understood) DB 2 for z/OS Query Optimization Ó IBM Corporation 2010
- Slides: 50