To BLU or not to BLU Some examples
To BLU or not to BLU Some examples where ROW beats COL Mika Lindholm, Capgemini Finland Oy Session Code: C 8 November 17 2015 15: 15 – 16: 15 Platform: DB 2 for LUW
What this session is all about • First hand experiences • Lessons learned from BLU Po. C’s & implementation • To Demonstrate that there are some occasions where ROW format is still superior to COL format • Ways of working around the problems • Do not hesitate to comment/challenge my views 2
Landscape where these examples were experienced • Retail type of data, mainly Sales Row level • Biggest table about 8 000 000 rows • New data increment coming in on monthly basis • • • Analytics-type workload, driven by SPSS and CADS IBM AIX 7. 1 TL 3 DB 2 10. 5 fp 5 POWER 7 platform CPU 1 entitled/8 online MEM 256 GB 3
Example #1 Big HASH joins • • • Two tables, one huge, one moderate-sized Join adds one column to a result set One to one relationship BLU prefers HASH join method ROW format handles the same in nested loop join • If indexing is good, ROW mode is much more efficient • This applies only to the case where SORTHEAP cannot accommodate both tables Rows Columns SUM Col length Size D GB Size I GB N 1 N 2 ~8 billion ~350 million 26 3 3689 800 0 17 183 300 4
Example #2 CONCAT/INSERT of lot of columns • The workload is driven by SPSS • Included a lot of CONCAT operations to generate unique identifier to result row • Similar results also, when target table had lot of columns (~400) 5
Example #2 CONCAT/INSERT of lot of columns • • The SQL in 10800 characters long ROW 112 OR’s COL 32 CONCAT’s End result has 3 columns, 12700 rows DB 2 SPSS Total 2: 46: 24 0: 05: 30 2: 51: 54 8: 04: 28 0: 04 8: 04: 32 6
Example #3 LOAD to new table • Just “LOAD and GO” says the marketing material • The ANALYZE phase can take relatively quite a long time • Comparison: 16 GB, 322 271 150 rows, simple LOAD from CURSOR • • LOAD to ROW - 77 minutes (no indexes) LOAD TO COL, new table – 95 minutes, 23 min ANALYZE phase LOAD to COL, prepared table – 70 minutes Increased 35% LOAD time to unprepared table • Even worse results can be seen if select statement is complex • Workaround: For periodic LOAD’s use pre-loaded/prepared table to speed up the process 7
Example #4 Wrong LOAD order • If possible, the order of how rows come into table might be optimized • This serves the data skipping functionality, since the synopsis table has the data in 1024 row chunks • If you consider manually affecting load order, try to make sure that “one size fits all” • This is an example where we tried to be clever but failed miserably: DB 2 exec time slowed from 60 min to 263 min 8
Example #4 Wrong LOAD order - results • Original vs “optimized” • Significantly worse performance seen • Reverted back to original LOAD order, which was performing as good as ROW : 48 : 19 : 50 : 21 : 52 SPSS : 24 DB 2 : 55 : 26 : 57 : 28 : 00 ROW COL 9
Conclusions • • BLU is very effective in most cases, but not always Performance testing is advised to prove efficiency Learn from our findings to avoid unnecessary rollbacks BLU isn’t a silver bullet 10
Mika Lindholm Capgemini Finland Oy mika. lindholm@capgemini. com To BLU or not to BLU, some examples where ROW beats COL Session Code: C 8 Please fill out your session evaluation before leaving!
- Slides: 11