Explain DB 2 EXPLAIN Explanations Introduction Who are
Explain DB 2 EXPLAIN Explanations
Introduction > Who are you ? § Job function – Application Developer / DBA § DB 2 experience > Who am I ? § Steen Rasmussen, CA inc. § Principal Technical Specialist DB 2 Tools § 22 years marriage with DB 2 since V 1 R 1 M 0 § Developer, DBA, Presales, Technical Manager, Presenter 2 10/3/2020
DISCLAIMER > DB 2 Performance is a huge topic § You will not become performance specialists today > This presentation is to be considered an appetizer for more detailed presentations § Understand the fundamentals of Explain § A “quick start” to decrypt Explain output § What to focus on 3 10/3/2020
Agenda – the BIG one (if you like the agenda and Roberta – I will be back) > What is EXPLAIN and why performance is important > What influences EXPLAIN output (Access Path Selection) > EXPLAIN prerequisites § Explain table(s) § How to create the tables § History and backward compatibility > How can EXPLAIN be executed > Access Path – introduction § What is Explain § OPTIMIZER overview > Explain Output – Explain EXPLAIN 4 10/3/2020
What can you do to save DB 2 CPU (and improve performance) > How to build SQL statements > Static <> Dynamic SQL differences > What if DB 2 decides the “wrong” Access Path § OPTHINT § REOPT § RUNSTATS – overview > Help to the Optimizer § Filter Factors 5 10/3/2020
Hvad kan du gore for at spare CPU (og forbedre performance) > DB 2 Locking § Lock types § Recommendations > SQL & Index design § STAGE 1 and STAGE 2 predicates – does it matter and what is it in reality > How do you find tuning potential – the “bad” SQL’s > Benchmarking § What is it § How can we learn from our experiences 6 10/3/2020
Some terms to understand when “digging” into DB 2 Performance issues > RI > Buffer Pool (BP) > RID pool – List Prefetch > Parallelism – DEGREE() > Access Path (AP) > Sargable – STAGE 1 STAGE 2 > Getpage - GETP > Optimizer > Explain 7 10/3/2020 > Sequential Prefetch (detection) > Deadlock / Timeout > Lock Avoidance > Lock, Latch and Claim > Constraints > Filter Factor & Cardinality > Correlated query > Firstkeycardf og Fullkeycardf
What is EXPLAIN & Optimizer overview
What is EXPLAIN > Illustrates how DB 2’s Optimizer will execute a SQL statement, a Package or a Plan (depending on how EXPLAIN is executed) > Why is EXPLAIN a necessity § 9 Can’t we simply look at the SQL-statement and estimate if it has been coded “all right” – What if it’s a 12 tabeller joines – Or the statement is 2 MB (or “just” 24 KB) Let’s see a cool example § Does performance mean anything – if a statement executes in half a second or 5 minutes ? § Predict WHAT will happen when SQL changes or at a Package REBIND after a reorganization and Runstats § The latest example – how will upgrading to DB 2 V 8 impact Access Path (compare DB 2 V 7 Optimizer with DB 2 V 8) § Do you always know whether a JOIN or SUBSELECT/EXISTS provides the best performance ? ? 10/3/2020
What is EXPLAIN > Why is Performance so important § Bad performing SQL costs $$$$ § One benchmark illustrates: – It costs 30$ to correct “bad” SQL in test – It costs 1000$ to correct in production § If response times are not optimal – Fewer transactions will go through the pipe – Each end user will be less productive – Other SQL-statements will suffer due to sharing of resources § Buffer Pools, I/O channels, Locking conflicts, contention in shared pools like SORT area, RID pool, . . § Hardware upgrade to conform to SLA 10 10/3/2020
What’s influencing the EXPLAIN output > Some factors to consider when comparing Explain between two different enviroments § Optimizer looks at Hardware type § Optimizer looks at number of processors § Optimizer looks at Buffer Pools > Host variables will be replaced by “Parameter Markers” when doing dynamic Explain – this could be a major problem in earlier DB 2 versions (pre DB 2 V 8) if host variables (or column predicate) was defined differently than the column defined in the DB 2 catalog 11 10/3/2020
What’s influencing the EXPLAIN output > Table size (and compression) > Column cardinality and Filter Factor > Indexes present and the columns § FIRSTKEYCARDF and FULLKEYCARDF (details later) § Different RUNSTATS methods to collect statistics > Clustering (Cluster Ratio) as well as clustered indicator > Number of Index Levels (NLEVELS) > SQL predicates (predicate analysis) > ORDER BY and the ability to eliminate sorts >. . 12 10/3/2020
EXPLAIN – OPTIMIZER overview > What is the purpose of the Optimizer § Decides how the “database navigates” § “Parsing” SQL statements to check tables and columns § Investigates statistics from the DB 2 catalog (which can be updated by RUNSTATS utility or manual) § Decides what is the “LEAST expensive” access path § DB 2’s Optimizer is COST BASED (opposite from Oracle) – Finds statistics from the DB 2 Catalog – Calculates Filter Factor(s) (estimated qualified #rows) – Finds a potential number of Access Path’s – Calculates the cost based on CPU and I/O cost 13 10/3/2020
EXPLAIN – OPTIMIZER overview > Cost based Optimizer – CPU and I/O § CPU costs – Looks at predicates (STAGE-1 or STAGE-2) (this can be covered in another session – next page is an appetizer ) – How many PAGE’s to look at – SORT – any kind § I/O costs – DB 2 Catalog statistics (like Clusterratio which also can be covered in a separate session) – Buffer Pool size – What does it cost to allocate temporary files etc. § A lot of activity spent on looking at which indexes exist and how these are matching the predicates 14 10/3/2020
EXPLAIN – OPTIMIZER overview Result Set SQL statement Relational Data Manager Apply stage 2 predicates and sort data (can be expensive) Optimized SQL Read Buffer or get data (I/O) 15 10/3/2020 Data Manager Buffer Manager Apply stage 1 predicates Data
EXPLAIN Pre-Requisites > EXPLAIN tables § creator. PLAN_TABLE (minimum to do explain) – Records Optimizers choice of Access Path – Not immediately easy to decrypt – many “codes” (see next slide) § creator. DSN_STATEMNT_TABLE (optional) – Shows COST estimates (this is HUGE in my opinion) § creator. DSN_FUNCTION_TABLE (optional) – Only used if UDF (User Defined Function) needs to be explained § Creator. DSN_STATEMENT_CACHE_TABLE (new in DB 2 V 8) – Used to explain DB 2 Dynamic Statement Cache parts of this 16 10/3/2020 or
EXPLAIN TABLE content (example) QUERYNO QBLOCKNO APPLNAME PROGNAME PLANNO METHOD CREATOR TNAME 1411 1 RQATD 1 0 SYSIBM TABNO SYSTABLES 1 ACCESSTYPE MATCHCOLS ACCESSCREATOR ACCESSNAME INDEXONLY I 0 SYSIBM DSNDTX 02 N SORTN_UNIQ SORTN_JOIN SORTN_ORDERBY SORTN_GROUPBY SORTC_UNIQ SORTC_JOIN N N SORTC_ORDERBY SORTC_GROUPBY TSLOCKMODE TIMESTAMP N N N IS REMARKS 2007073116080531 PREFETCH COLUMN_FN_EVAL MIXOPSEQ VERSION 0 COLLID CAD 72_2004 -08 -31 -18. 24. 46 RQPAR 110_ALL COLLID ACCESS_DEGREE ACCESS_PGROUP_ID JOIN_DEGREE JOIN_PGROUP_ID RQPAR 110_ALL ------ SORTC_PGROUP_ID SORTN_PGROUP_ID PARALLELISM_MODE MERGE_JOIN_COLS ------ CORRELATION_NAME - ------ PAGE_RANGE JOIN_TYPE GROUP_MEMBER A YÄ. ½T æ WHEN_OPTIMIZE QBLOCK_TYPE BIND_TIME SELECT OPTHINT 2007 -07 -31 -16. 08. 03. 615184 PRIMARY_ACCESSTYPE PARENT_QBLOCKNO TABLE_TYPE 17 10/3/2020 IBM_SERVICE_DATA 0 T ³ INITUK 15 HINT_USED
EXPLAIN Pre-Requisites > How to create the tables for EXPLAIN ? § Every new DB 2 version adds additional columns – DB 2 V 1 R 0 had no EXPLAIN (afair) – DB 2 V 1 R 2 had 25 columns in PLAN_TABLE – . . – DB 2 V 7 has 51 columns in PLAN_TABLE – DB 2 V 8 has 58 columns in PLAN_TABLE – DB 2 V 9 has 59 columns in PLAN_TABLE § Backward and forward compatible !!! > CREATE TABLE steen. plan_table like. . . . § Usually the Systems Programmer creates one as part of the IVP § Look into IBM SDSNSAMP 18 10/3/2020
What EXPLAIN DOESN’T show > Explain only shows SELECT, DELETE, UPDATE, INSERT > Important not to forget the issues below when doing performance / tuning § RI Definitions § TRIGGERs executed as part of the SQL-statement § UDF’s § Table- and Column Check Constraints > Not always a guarantee DB 2 will USE the illustrated AP § Prefetch activities can be disabled depending on BP status § Parallelism is decided at the execution time § RID pool shortage 19 10/3/2020
How to execute EXPLAIN > EXPLAIN § Explain Dynamic Statement cache will not be covered in detail. Normally used during DBA Performance/Tuning § Manual type-in via SPUFI § BIND eller REBIND PACKAGE med EXPLAIN(YES) 20 10/3/2020
How to execute EXPLAIN > Manual “type in” IQPSQLE 1 --- (CAPS ON) COMMAND ===> --- SQL Editor --- (NULLS ON) --- 2007/08/03 13: 25: 54 SCROLL ===> CSR Edit the SQL statements that you wish to test below. When you are done editing the SQL, hit PF 3/15 or enter 'END' to return to the RC/SQL control panel. Host variables will be parsed out, so that you may test your SQL repeatedly with various host values. ___ explain plan set queryno = 190 for ___ select * from sysibm. systables ___ where name = ? ___ and creator = 'PTI' ___ with UR ; ___ commit; ___ select * from plan_table where queryno=190; ___ **************** BOTTOM OF DATA **************** 21 10/3/2020
How to execute EXPLAIN > BIND / REBIND and use EXPLAIN(YES) > Note – you will have to select from PLAN_TABLE to view Access path information. To Rebind Or Not To Rebind …. That is the QUESTION ! § Consequences can be “less than optimal” – or even “career limiting decisions” 22 10/3/2020
How to execute EXPLAIN > Edit BIND or REBIND COMMAND ===> SCROLL ===> PAGE ***************** TOP OF DATA *****************. CALL DSN PARM(S 81 A). DATA REBIND PACKAGE(RQPAR 110_ALL. RQATD. (CAD 72_2004 -08 -31 -18. 24. 46)) OWNER(RASST 02) QUALIFIER(RASST 02) CURRENTDATA(NO ) VALIDATE(RUN ) EXPLAIN(yes) ISOLATION(CS) RELEASE(COMMIT ) DEGREE(1 ) REOPT(NONE ) KEEPDYNAMIC(NO ) DBPROTOCOL(PRIVATE) ENCODING(EBCDIC ) IMMEDWRITE(NO ) FLAG(I); . ENDDATA **************** BOTTOM OF DATA **************** 23 10/3/2020
BIND / REBIND with EXPLAIN warning > Manual EXPLAIN § This is a “What if” analysis § Illustrates WHICH Access Path will be chosen if a BIND or REBIND is executed § You can check the implications / consequences > BIND / REBIND with EXPLAIN(YES) § DB 2 will generate a NEW Access path !!!!! § Is this a good idea ? Maybe ! § What if the DB 2 catalog statistics ISN’T optimal – No RUNSTATS executed statistics columns in the catalog have -1 Optimizer has NO clue about statistics for tables, indexes, columns 24 10/3/2020
Versioning of Explain > How § One extra step at Program promotion § Can be integrated with your existing Change Management process – Execute EXPLAIN – Save Explain output in a table – Each Explain will generate a new VERSION – Save xx versions – Create a SQL statement which will flag packages where the cost difference is greater than xx % § SUM(PROCMS) and SUM(PROCSU) per Explained Package version and compare to the costs from previous Explain – Also consider saving key values from the DB 2 catalog used by the Optimizer 25 10/3/2020
Versioning of Explain > Why § The primary reason is quickly to identify WHEN and WHY the performance of a program increased/decreased § It is always possible to verify if performance has changes – Primarily due to COST or AP changes § You can see when the change happened (every Explain has a unique timestamp) § You can see what the DB 2 Catalog statistics were at the Explain time 26 10/3/2020
Access Path – what is good / bad AP > No definitive answer exists for this question – except for: IT DEPENDS ! > Let’s look at some issues in order to find the “correct” answer > One example : One SQL-statement costs 0, 050 CPU-sec while another one costs 2 CPU-minutes – which one do you want to spend time tuning ? 27 10/3/2020
Access Path – what is good / bad AP > The “cheap” SQL-statement is executed 100, 000 times per hour in an online transaction, while the other is executed once every day in a batch job § One weeks consumption: § Online SQL: (0, 050 x 100000 x 24 x 7) =840000 CPU-sec. § Batch SQL : (2 x 60 x 7) 28 10/3/2020 = 840 CPU-sec.
Explain EXPLAIN
Explain EXPLAIN > It will take hours to describe every column currently existing in PLAN_TABLE – here a few to focus on: > Let’s use a real SQL-statement which has been explained to “decrypt” the PLAN_TABLE columns. > Description of all the PLAN_TABLE columns can be viewed in this manual: IBM DB 2 SQL REFERENCE GUIDE (SC 18 -7426 -04) : http: //publib. boulder. ibm. com/epubs/pdf/dsnsqj 14. pdf 30 10/3/2020
Explain EXPLAIN PLAN SET QUERYNO=75 for SELECT A. NAME , A. CREATOR , A. TBNAME , A. TBCREATOR , A. CREATEDBY , B. COLNAME , B. COLSEQ , B. ORDERING , A. CLUSTERING , A. UNIQUERULE FROM SYSIBM. SYSINDEXES A , SYSIBM. SYSKEYS B WHERE ( A. CREATOR = B. IXCREATOR AND A. NAME = B. IXNAME ) AND A. CREATOR = : hostvar 1 ORDER BY A. NAME , A. CREATOR , B. COLSEQ ; COMMIT; SELECT * FROM PLAN_TABLE WHERE QUERYNO=75; QUERYNO QBLOCKNO PROGNAME PLANNO METHOD CREATOR TNAME 75 1 BPASQL 8 1 0 SYSIBM SYSINDEXES 75 1 BPASQL 8 2 1 SYSIBM SYSKEYS 75 1 BPASQL 8 3 3 ACCESSTYPE MATCHCOLS I 1 I 2 0 ACCESSCREATOR ACCESSNAME SYSIBM DSNDXX 01 SYSIBM DSNDKX 01 SORTN_ORDERBY N N N SORTC_UNIQ N N N 31 10/3/2020 SORTC_JOIN N INDEXONLY SORTN_UNIQ N N N SORTC_ORDERBY N N Y SORTN_JOIN N SORTC_GROUPBY N N N PREFETCH L TSLOCKMODE IS IS SORTN_GROUPBY N N N
Explain EXPLAIN > QUERYNO – The number specified doing manual Explain – or statement number from the package/plan which explained. the was – For programs – this number is used to identify a SQLstatement in the source code. – In this scenario, the QUERYNO=75 which also was specified in the manual explain. – Did you know you can assign a fixed QUERYNO to the SQL statement in your program so it’s easier to compare statements when the program is changed ? ? ? !!! 32 10/3/2020
Explain EXPLAIN > QBLOCKNO – A number identifying every query block inside the SQL statement. – A typical example is if a UNION or SUBSELECT is present where DB 2 will have to execute several statements within one SQL statement. – In this example the SQL-statement is a simply JOIN, which is why we only see QBLOCKNO=1 33 10/3/2020
Explain EXPLAIN > PROGNAME – Name of the program which holds the SQL-statement being explained – Package name when REBIND / BIND of a package is executed using EXPLAIN(YES) – Using dynamic explain – this column will hold the name of the program executing the dynamic explain – In this scenario a dynamic explain was executed using program BPASQL 8. 34 10/3/2020
Explain EXPLAIN > PLANNO – Specifies which sequence the individual components are executed inside a QBLOCKNO. – Is this really interesting and do we need to worry about it ? – This will be covered in the section dealing with FILTER FACTOR – this is one of the most important informations to pay attention to – In this scenario we can see SYSINDEXES is accessed first and then SYSKEYS, and finally a third component is performed (sorting) 35 10/3/2020
Explain EXPLAIN > METHOD – Describes which form of JOIN being used. § 0 : First table accessed in this step (QBLOCKNO) § 1 : NESTED LOOP JOIN § 2 : MERGE SCAN JOIN § 3 : Sorting (of some kind) – § (ORDER, GROUP, DISTINCT, UNION) 4 : HYBRID JOIN – We can see SYSINDEXES is accessed first, and then a NESTED LOOP JOIN with SYSKEYS and finally a SORT is performed to satisfy the ORDER BY 36 10/3/2020
Explain EXPLAIN > CREATOR – TNAME – Table name and creator for the table accessed – If METHOD=3 , these columns will be spaces – In this scenario two tables are accessed: 37 10/3/2020 § SYSIBM. SYSINDEXES § SYSIBM. SYSKEYS
Explain EXPLAIN > ACCESSTYPE – Describes how the table mentioned is accessed. – This is a very important information which in many “performance exercises” can be a “first shop stop” (why will be explained) – In DB 2 9 you can find 17 different methods – the most important ones will be covered here: – 38 10/3/2020 – I : Index Access – I 1: One-Fetch Index scan – M : Superceeded by MX, MI, MU – Multiple Index Access (can be excellent or very bad – at least two indexes from the same table are used) – N : Index Scan where IN is used – R : Tablespace Scan – if many pages exist for this tablespace, this can be extremely expensive In this scenario both table accessed have ACCESSTYPE=I, so we know indexe(s) are used – and we don’t face a tablespace scan
Explain EXPLAIN > MATCHCOLS – Illustrates how many columns being used in the index listed under ACCESSNAME. – Used for ACCESSTYPE : I , I 1 , N and MX – The value can be: 39 10/3/2020 § o : The entire index is scanned – warning !!!!! § >0 : The number of columns being used from the index listed prior to Tablespace access (unless Index Only). § If the index has 4 columns and MATCHCOLS = 1 , this can lead to non-optimal response times – depending on column cardinality. – Assume the table has 1. 000 rows – The index has 4 columns – The first column has 2 distinct values (cardinality=0. 5) – DB 2 will “guess” half the table rows qualify TS-scan
Explain EXPLAIN > ACCESSNAME and ACCESSCREATOR – If an index is used to access the table, these columns illustrates which index(es) being used. – Use your common sense and think about the index listed – is it the best one based on the predicates used in the WHERE component of the SQL-statement. – Also think about this information in conjunction with MATCHCOLS – if more columns exist in the index compared to what is described in MATCHCOLS – maybe an additional pedicate can improve performance. – In our scenario an index is used to access both tables specified in our JOIN statement. 40 10/3/2020
Explain EXPLAIN > INDEXONLY – Describes if DB 2 is satisfied by ONLY scanning the index listed under ACCESSNAME – without looking into the data piece (tablespace). – The value can be Y(es) or N(o) – Often it may pay off to have an additional column in the index to avoid the tablespace access (indexonly), and maybe one I/O can be spared at in every statement (the most expensive in the DB 2 world). Before making this decision – please have a closer look at “index complications – advantages and disadvantages” – Our JOIN scenario is using two indexes, and both have INDEXONLY=N , meaning what is being selected can NOT be satisfied by the columns in the 2 indexes used. 41 10/3/2020
Explain EXPLAIN > SORTN_xxxxxx hvor xxxxxx : 42 10/3/2020 – UNIQUE : Must the “internal” table ne sorted to remove duplicates – JOIN : Is it necessary to sort due to METHOD = 2 or 4 ? – ORDERBY : Must the “internal” table be sorted due to ORDER BY – GROUPBY : Must the “internal” table be sorted due to GROUP BY
Explain EXPLAIN > SORTC_xxxxxx hvor xxxxxx : § 43 10/3/2020 – UNIQUE : Must the “composite” table be sorted to remove duplicates ? – JOIN : Sorting due to METHOD = 2 or 4 ? – ORDERBY : Must the “composite” table be sorted due to ORDER BY – GROUPBY : Must the “composite” table be sorted due to GROUP BY Our scenario requires the composite result table (from the JOIN) to be sorted due to ORDER BY in the SQL-statement.
Explain EXPLAIN > PREFETCH – Describes which PREFETCH method MIGHT be used. § L : List Prefetch – DB 2 sorts RID’s from index(es) to avoid reading the same data page more than once. § D : Dynamic Prefetch – DB 2 will start to read blocks of data into the buffer pool asynchronously (if it pays off) § S : Sequential Prefetch – DB 2 will read all “needed” pages into the buffer pool asynchronously to save time doing I/O – This scenario illustrates index DSNDXX 01 will be used where CREATOR fullfill the WHERE clause. Since the table also has to be accessed, the RID’s are being sorted so DB 2 only needs to read the same page once. If this index was the CLUSTERING index – maybe LIST PREFETCH could have been avoided. 44 10/3/2020
Explain EXPLAIN > TSLOCKMODE – Describes what kind of LOCK DB 2 will use for the tablespace being accessed. Beside the TS-locks, DB 2 will do table or row locks (covered in a separate section) – Different LOCK types will be described later, but in general X og IX are not considered “nice” since these can prohibit concurrent access and ultimately lead to TIMEOUTs and DEADLOCKs. § In our scenario we have IS (Intent Share) for both tables being accessed, which is expected since it’s a simple SELECT statement. 45 10/3/2020
What can YOU do to save DB 2 CPU and improve performance ----Until next time Thank You
- Slides: 46