IBM Software Group Compression and Storage Optimization IDS

IBM Software Group Compression and Storage Optimization IDS 11. 50. x. C 4 Kevin Cherkauer © 2009 IBM Corporation

IBM Software Group What Is Compression? § Ability to store data rows in compressed format on disk § Saves up to 90% of row storage space § Ability to estimate possible compression ratio § Fits more data onto a page § Fits more data into buffer pool § Reduces logical log usage 2 © 2009 IBM Corporation

IBM Software Group What Is Storage Optimization? § Ability to consolidate free space in a table or fragment to the end § Ability to return this free space to the dbspace § Space returned can then be used by any table in the dbspace 3 © 2009 IBM Corporation

IBM Software Group Compression Concepts § Lempel-Ziv (LZ) based algorithm – static dictionary, built by random sampling § Frequently repeating patterns replaced with 12 -bit symbol numbers § Any byte that does not match a pattern is also replaced with a 12 -bit reserved symbol number § Patterns can be up to 15 bytes long § Max possible compression = 90% (15 bytes replaced with 1. 5 bytes = 12 bits) 4 © 2009 IBM Corporation

IBM Software Group Compression Symbols § 12 -bits means 4, 096 symbols – 256 reserved symbols for bytes that match no pattern – 3, 840 pattern symbols § Patterns > 7 bytes use up two symbol numbers § Thus not all patterns can be compressed § Dictionary tries to capture the “best” patterns (frequency x length) § Non-matching bytes grow by 50% (8 bits replaced by 12 bits) 5 © 2009 IBM Corporation

IBM Software Group Data Affects on Compression § Data with frequently repeating long patterns is the most compressible – Long runs of 0’s or blanks are very compressible § Noise-like data is poorly or not at all compressible: – Encrypted data – Data already compressed by another algorithm – Data without long repeating patterns § Avoid putting a “noise-like” column between other columns that have frequent patterns – disrupts potential column-spanning patterns 6 © 2009 IBM Corporation

IBM Software Group Performance Impact of Compression § IO-bound workloads – Compression may improve performance by reducing IOs (both data page and logical log) – More data fits on a page, so more in buffer pool – Log records are smaller, so less logging § For CPU-bound workloads – Additional CPU used to compress and expand rows – Should not be a large impact 7 © 2009 IBM Corporation

IBM Software Group HDR, ER, CDC (Data. Mirror) and Compression § All are supported on compressed tables § HDR – Tables will be compressed on secondary iff they are compressed on primary § ER – Compression status of tables is independent between source and target, specified by user § CDC – Compression of targets is a function of what the target database supports and what use specifies 8 © 2009 IBM Corporation

IBM Software Group OAT Interface § Compression and Storage Optimization can be managed via the OAT graphical interface § Details in talk following this one 9 © 2009 IBM Corporation

IBM Software Group Things That Cannot Be Compressed § Out-of-row data (e. g. blobs) § Indexes § Catalog tables § Temp tables § Partition tables § Dictionary tables § Tables in the following databases: sysuser, sysmaster, sysutils, syscdr, syscdcv 1 10 © 2009 IBM Corporation

IBM Software Group Dictionary Storage § Each compressed (non-fragmented) table or table fragment has its own compression dictionary § Dictionary consumes ~75 K – 100 K per fragment § Thus compressing tiny tables is not recommended § All dictionaries for tables/fragments in a given dbspace are stored in a special hidden dictionary table in that dbspace § Union of all dictionary tables exposed by sysmaster – syscompdicts_full table – includes binary dictionary; access restricted to user “informix” – syscompdicts view – globally accessible; omits binary dictionary for security 11 © 2009 IBM Corporation

IBM Software Group Compression Operations § create_dictionary – Creates compression dictionary – Any rows inserted or updated after will be compressed – Previously existing rows will not be compressed § compress – Does implicit create_dictionary if no dictionary exists – Compresses all previously existing rows – Table fully accessible to other queries § estimate_compression – Estimate compression ratio a brand-new dictionary could get – If already compressed, estimate current compression ratio (else 0) – Also shows the estimated gain to be had by making a new dictionary (difference between first and second estimates) 12 © 2009 IBM Corporation

IBM Software Group Compression Operations 2 § uncompress, uncompress_offline – Uncompress every row in the table/fragment – Deactivate the compression dictionary – “uncompress” – table is fully accessible – “uncompress_offline” – table is XLOCKed, no query access § purge_dictionary – Delete old inactive dictionaries – Separate command because ER, Data. Mirror might need old dictionaries 13 © 2009 IBM Corporation

IBM Software Group Storage Optimization Operations § repack, repack_offline – Move rows within a table or fragment to consolidate free space at the end – “repack” – table is fully accessible – “repack_offline” – table is XLOCKed, no query access § shrink – Return any free space at end of table or fragment to the dbspace – Normally done after a repack 14 © 2009 IBM Corporation

IBM Software Group Admin API Interface § All compression and storage optimization operations are invoked via the IDS Admin API built-in UDRs – execute function task(…); – execute function admin(…); § Example – execute function task(“table compress repack shrink”, “table_name”, “database_name”, “owner_name”); § Enables OAT graphical interface § Enables remote execution (DBA does not need to log directly in to the target machine) 15 © 2009 IBM Corporation

IBM Software Group Summary § Compression and Storage Optimization can save disk space and thus $$$ § For IO-bound workloads Compression can also improve performance § Compression reduces logging § Compression fits more data into the buffer pool § Storage Optimization allows space saved by compression to be reclaimed from tables and fragments of tables 16 © 2009 IBM Corporation

IBM Software Group Thanks! 17 © 2009 IBM Corporation
- Slides: 17