IBM Informix Dynamic Server Cheetah Agile Fast Performance
IBM – Informix Dynamic Server Cheetah - Agile & Fast Performance enhancements © 2007 IBM Corporation
IBM Informix Dynamic Server Agenda ► Non-Blocking Checkpoints ► Automatic Checkpoints ► Recovery Time Objective ► Automatic LRU Tuning ► Automatic AIO VP Tuning ► Support for Direct I/O 2 © 2007 IBM Corporation
IBM – Informix Dynamic Server Cheetah Checkpoint Improvements © 2007 IBM Corporation
IBM Informix Dynamic Server What is a checkpoint? ► A checkpoint is a point in time where cached data (bufferpool) is flushed to disk to create a consistency point for fast recovery, backups, HDR… 4 © 2007 IBM Corporation
IBM Informix Dynamic Server What is an LRU? ► The LRU are queues used to manage the bufferpool ► An LRU is comprised of 2 lists ■ MLRU • Tracking modified pages in the queue ■ FLRU • Tracking free or unmodified pages in the queue 5 © 2007 IBM Corporation
IBM Informix Dynamic Server Existing characteristics of Checkpoints ►Significant transaction blocking, even fuzzy checkpoints ►Fuzzy checkpoints ■ Unpredictable checkpoint processing time ■ Unpredicatable recovery time 6 © 2007 IBM Corporation
IBM Informix Dynamic Server Existing characteristics of Checkpoints ►Checkpoint tuning vs OLTP tuning. ■ Tune LRU very aggresive • causes constant flushing of the buffer pool • Reduces the write cache • flushers consuming CPU cycles • Increases buffer contention ■ Tune LRU less aggressive • checkpoints were longer • transactions blocked for longer periods • longer disaster recovery time ►Wasn’t easy to figure out optimal tuning. 7 © 2007 IBM Corporation
IBM – Informix Dynamic Server Non-Blocking Checkpoints © 2007 IBM Corporation
IBM Informix Dynamic Server Non Blocking Checkpoints ►Most checkpoints do not block transactions during buffer flushing. ►Exceptions…. ■ Checkpoint running short on resources • Physical log 75% • At least one checkpoint per logical log space. ■ Admin, archive checkpoints ►Fuzzy checkpoint completely removed ►Phase A recovery has been removed ►Physical logging activity to 7. 3 amounts • Will need to increase size of physical log!! 9 © 2007 IBM Corporation
IBM Informix Dynamic Server Benefits of Non-Blocking Checkpoints? ►Transaction processing continues during the disk flush portion of checkpoint processing ►Allows LRU flushing to be relaxed ■ Dramatic transaction performance improvement. ►More Frequent checkpoints ■ Shortens fast recovery 10 © 2007 IBM Corporation
IBM Informix Dynamic Server Interval checkpoint 11 © 2007 IBM Corporation
IBM Informix Dynamic Server Recommendations ►Increase LRUMIN and LRUMAX to at least 60 and 70 ►Make sure the physical log is large ■ Move Online ■ Can be larger than 2 GB ►Make sure the logical logs space is large ►Check new onstat –g ckp 12 © 2007 IBM Corporation
IBM Informix Dynamic Server What do you do if Checkpoint Block? ►Use automatic checkpoint feature ■ The server will automatically trigger checkpoints basing on resources remaining. ►Increase the size of physical/logical log ■ The server will suggest which resource to increase and what size it should be ►Make LRU flushing more aggressive ►Increase I/O performance ■ More AIO VPs and cleaners ■ Improve performance of I/O subsystem 13 © 2007 IBM Corporation
IBM – Informix Dynamic Server Automatic Checkpoints © 2007 IBM Corporation
IBM Informix Dynamic Server Automatic Checkpoints ►If potential transaction blocking detected ►Caliculation based on. . ■ Physical, logical logs usage ■ Buffer flush speed ■ Transaction throughput ►To help Automatic Checkpoints ■ Increase the physical log size ■ Increase the logical log size ■ Increase LRU flushing (Use automatic LRU Tunning) The server will make suggestions when resources are lacking Monitor online. log and onstat –g ckp 15 © 2007 IBM Corporation
IBM Informix Dynamic Server Automatic Checkpoints ►Default is always on ►onmode –wm AUTO_CKPTS=0 … turn off ►onmode –wm AUTO_CKPTS=1 … turn on 16 © 2007 IBM Corporation
IBM Informix Dynamic Server Checkpoint Performance Advisory ► During checkpoint IDS will evaluate checkpoint related configuration parameters and produce a performance advisory if they are not optimal setting to avoid transaction blocking. ► Performance Advisory is in the second part of onstat –g ckp output and in online. log ► Configuration parameters evaluated at checkpoint: ■ PHYSFILE ■ PHYSBUFF ■ LOGFILES and LOGSIZE 17 © 2007 IBM Corporation
IBM Informix Dynamic Server PHYSFILE – Physical log Size ► 110% of the combined size of all bufferpools for optimum performance ► Enables fast recovery to use all bufferpool resources ► Depends on transactional workload and speed of the disks 18 © 2007 IBM Corporation
IBM Informix Dynamic Server PHYSBUFF - Physical buffer size ► With RTO_SERVER_RESTART off, default value is 128 KB ► With RTO_SERVER_RESTART on, default value is 512 KB ► If a smaller value is used, a message appears in the online. log. 19 © 2007 IBM Corporation
IBM Informix Dynamic Server Checkpoint Performance Advisory – Physical log ► During checkpoint processing potential physical log overflow is detected. Performance advisory: Physical log is running out of room. Results: Blocking transactions until checkpoint is complete. Action: Increase physical log size. 20 © 2007 IBM Corporation
IBM Informix Dynamic Server Physical log and automatic checkpoints ON ► If the physical log is less than 10 MB (10000 KB) or automatic checkpoints every 35 seconds, then automatic checkpoints are turned off Performance advisory: The physical log is too small for automatic checkpoints. Results: Automatic checkpoints are disabled. Action: Increase the physical log size to at least ## Kb. 21 © 2007 IBM Corporation
IBM Informix Dynamic Server LOGBUFF – Logical log buffer ► Default value is 64 KB ► If value < 64 KB, a message appears in the online. log ► Assumes buffered logging is used. If non-buffered logging is used, smaller buffers can be used 22 © 2007 IBM Corporation
IBM Informix Dynamic Server Checkpoint Performance Advisory – Logical log ► During checkpoint processing system detects potential for reaching checkpoin per log span limit. Performance advisory: Logical log is running out of room. Results: Blocking transactions until checkpoint is complete. Action: Increase logical log size. 23 © 2007 IBM Corporation
IBM Informix Dynamic Server Long Transaction blocking checkpoints ► Long transactions are triggering frequent checkpoints Performance advisory: Long transactions are triggering blocking checkpoints. Results: Blocking transactions until checkpoint is complete. Action: Increase logical log size. 24 © 2007 IBM Corporation
IBM Informix Dynamic Server Logical and automatic checkpoints ON ► If the logical log is less than 20 MB (20000 KB) or auto checkpoint generated every 35 seconds. Performance advisory: The logical log space is too small for automatic checkpoints. Results: Automatic checkpoints are disabled. Action: Increase the logical log space to at least ## Kb. 25 © 2007 IBM Corporation
IBM Informix Dynamic Server Performance Warning Examples 23: 28: 26 13: 25: 54 13: 25: 54 26 Performance Advisory: The current size of the physical log buffer is smaller than recommended. Results: Transaction performance might not be optimal. Action: For better performance, increase the physical log buffer size to 128. Performance Advisory: Based on the current workload, the physical log might be too small to accommodate the time it takes to flush the buffer pool. Results: The server might block transactions during checkpoints. Action: If transactions are blocked during the checkpoint, increase the size of the physical log to at least 14000 KB. Performance Advisory: The physical log is too small for automatic checkpoints. Results: Automatic checkpoints are disabled. Action: To enable automatic checkpoints, increase the physical log to at least 14000 KB. © 2007 IBM Corporation
IBM Informix Dynamic Server onstat –g ckp IBM Informix Dynamic Server Version 11. 10. FB 7 TL -- On-Line -- Up 01: 03: 54 -- 39936 Kbytes AUTO_CKPTS=Off RTO_SERVER_RESTART=Off Clock Interval Time Trigger LSN 24 16: 04: 11 Plog 26: 0 x 2 d 50 f 8 25 16: 04: 31 Plog 28: 0 x 108 c 26 16: 05: 03 *User 28: 0 x 32 b 018 27 16: 20: 05 CKPTINTVL 28: 0 x 32 e 018 28 16: 21: 38 Plog 29: 0 x 1 c 676 c 29 16: 21: 52 *User 29: 0 x 3 b 9018 30 16: 23: 45 *Backup 29: 0 x 3 bd 018 Max Plog pages/sec 200 Max Llog pages/sec 200 Critical Sections Total Flush Block # Ckpt Wait Time Waits Time 0. 4 1 0. 0 0. 4 0. 6 2 0. 0 0. 6 0. 1 0. 0 0. 5 1 0. 0 0. 5 0. 1 0. 0 0 0. 0 Max Dskflush Avg Dirty Time pages/sec 1 405 10 Long Time 0. 4 0. 6 0. 1 0. 0 0. 5 0. 1 0. 0 Physical Log # Dirty Dskflu Total Buffers /Sec Pages 709 750 940 722 34 34 187 1 1 0 705 750 33 33 186 16 16 18 Logical Log Avg Total /Sec Pages 10 638 38 1276 5 810 0 3 8 640 12 499 0 4 Avg /Sec 8 67 24 0 6 33 0 Blocked Time 1 The server is blocking transactions because the physical log is too small. Based on the current workload, to prevent the server from blocking future transactions, increase the size of the physical log to 14000 KB. Based on the current workload, the logical log space might be too small to accommodate the time it takes to flush the buffer pool. The server might block transactions during checkpoints. If the server blocks transactions, increase the size of the logical log space to at least 14000 KB. 27 © 2007 IBM Corporation
IBM Informix Dynamic Server onstat –g ckp AUTO_CKPTS On/Off Displays if automatic checkpoints feature is on or off RTO_SERVER_RESTART Seconds Displays the RTO policy. 0=RTO policy is off. Estimated recovery time Seconds This is the estimated time it would take the IDS server to perform fast recovery. Interval Number Checkpoint interval id Clock Time Wall clock time This is the wall clock time that the checkpoint occurred Trigger Text There are several events that can trigger a checkpoint. The most common are RTO, Plog or Llog (running out of logical log resources). LSN Log position of checkpoint Total Time Seconds Total checkpoint duration from request time to checkpoint completion Flush Time Seconds Time to flush bufferpools Block Time Seconds Transaction blocking time # Waits Number of transactions that blocked waiting for checkpoint Ckpt Time Seconds amount of time it takes for all transactions to recognize a checkpoint has been requested Wait Time Seconds Average time thread waited for checkpoint Long Time Seconds Longest amount of time a transaction waited for checkpoint # Dirty Buffers Number of buffers flushed to disk during checkpoint processing Dskflu/Sec Number of buffers flushed to disk per sec during checkpoint processing Plog Total Pages Number Total number of pages physically logged during the checkpoint interval Plog Avg/Sec Number Average rate of physical log activity during the checkpoint interval Llog Total Pages Number Total number of pages logically logged during the checkpoint interval Llog Avg/Sec Number Average rate of logical log activity during the checkpoint interval 28 © 2007 IBM Corporation
IBM Informix Dynamic Server New SYSMASTER Tables ►syscheckpoint ■ Keeps history on the last 20 checkpoints ►sysckptinfo ■ Keeps info on automatic checkpoints 29 © 2007 IBM Corporation
IBM – Informix Dynamic Server Recovery Time Objective (RTO) © 2007 IBM Corporation
IBM Informix Dynamic Server Onconfig parameter ► New onconfig parameter ► RTO_SERVER_RESTART ■ Amount of time in seconds that Dynamic Server has to recover from a problem after you restart Dynamic Server and bring the server into online or quiescent mode. ■ Seed the logical recovery pages in physical log ■ Valid values are 60 – 1800 ■ Default is 0 (disabled) 31 © 2007 IBM Corporation
IBM Informix Dynamic Server RTO ► Facts about RTO_SERVER_RESTART ■ Allows users to set target fast recovery time. ■ RTO_SERVER_RESTART and CKPTINTVL mutually exclusive. ■ If turned off, the system will use the CKPTINTVL to trigger checkpoints (the old style). ■ Valid values 60 - 1800 seconds (1– 30 minutes). ■ Automatically adjust the checkpoint frequency to meet the RTO policy. ■ The server will fine tune with each fast recovery to improve the predictability. ■ This parameter can be updated with onmode –wf and –wm. ■ RTO_SERVER_RESTART=0 (off) is the default. 32 © 2007 IBM Corporation
IBM Informix Dynamic Server How does RTO_SERVER_RESTART work? ► Estimate/Calculate the speed of fast recovery ■ Server boot time ■ Physical log recovery (RAS_PLOG_SPEED) ■ Logical log recovery (RAS_LLOG_SPEED) ■ Assume all updates fit into bufferpools(pages seeded in physlog) ► Automatic checkpoints based on resource usage to meet RTO policy. 33 © 2007 IBM Corporation
IBM – Informix Dynamic Server Auto LRU Tuning © 2007 IBM Corporation
IBM Informix Dynamic Server Automatic LRU Tuning (lru_min/max_dirty) ►With interval checkpoints, LRU flushing can be less aggressive. ■ so go ahead and relax… your lru_min/max_dirty ■ Can bring dramatic increases in performance. ►LRU flushing will automatically adjust to be more aggressive ■ When a hot page is replaced, 1%. ■ When a foreground write occurs, 5% ■ Time to flush bufferpool> RTO_SERVER_RESTART, 10% more aggressive ■ Continues adjusting until optimal. 35 © 2007 IBM Corporation
IBM Informix Dynamic Server LRU_MAX_DIRTY and LRU_MIN_DIRTY ► Default values ■ LRU_MAX_DIRTY 60% ■ LRU_MIN_DIRTY 50% ► A good starting point when AUTO_LRU_TUNING is ON ■ LRU_MAX_DIRTY 80% ■ LRU_MIN_DIRTY 70% 36 © 2007 IBM Corporation
IBM Informix Dynamic Server Automatic LRU Tuning – Configuration ► AUTO_LRU_TUNING ■ 0 or 1 ■ ON by default ► Dynamically switch off LRU_TUNING ■ onmode –wm AUTO_LRU_TUNING=0 ► Dynamically switch on LRU_TUNING ■ onmode –wm AUTO_LRU_TUNING=1, min=val, max=val ► Dynamically set LRU parameters when lru tuning is on/off ■ onmode –wm AUTO_LRU_TUNING=min=val ■ onmode –wm AUTO_LRU_TUNING=max=val 37 © 2007 IBM Corporation
IBM Informix Dynamic Server Performance Advisory when auto LRU tuning ON ► During checkpoint if buffers flush time exceeds RTO. Performance advisory: The time to flush the bufferpool ## Is longer than RTO_SERVER_RESTART ##. Results: The IDS server can't meet the RTO policy Action: Automatically adjusting LRU flushing to be more aggressive. Adjusting LRU for bufferpool - id ## size ##k Old max ## min ## New max ## min ## 38 © 2007 IBM Corporation
IBM Informix Dynamic Server …. . when auto LRU tuning OFF Performance advisory: The time to flush the bufferpool ## Is longer than RTO_SERVER_RESTART ##. Results: The IDS server can't meet the RTO policy Action: Automatic LRU tuning is off. Either turn on automatic LRU tuning or change LRU flushing to be more aggressive. 39 © 2007 IBM Corporation
IBM – Informix Dynamic Server Automatic AIO VP Tuning © 2007 IBM Corporation
IBM Informix Dynamic Server Automatic Tuning of AIO VPs ► For cooked chunks ► Monitor I/O performance and add more AIO VPs and/or cleaners if needed ► AUTO_AIOVPS configuration parameter ■ 0 or 1 ■ ON by default ► Dynamically change it using onmode ■ onmode –wm/-wf AUTO_AIOVPS=1 ■ onmode –wm/-wf AUTO_AIOVPS=0 41 © 2007 IBM Corporation
IBM Informix Dynamic Server NUMAIOVPS or VPCLASS aio_num=# ► Initial setting will be 2 AIO VPs per cooked chunk ► If you add one cooked chunk, 2 more AIO VPs will be added up to a value of 128 ► Changing the value in ONCONFIG does not have any impact if RTO_SERVER_RESTART is ON. ► Possible to change the value dynamically using onmode -p 42 © 2007 IBM Corporation
IBM Informix Dynamic Server CLEANERS ► Initial setting will be 1 cleaner thread per AIO VP ► Value adjusted in conjunction with changes to the number of AIO VPs. 43 © 2007 IBM Corporation
IBM Informix Dynamic Server Additional Information on checkpoints http: //www. ibm. com/developerworks/db 2/library/techarticle/dm-0703 lashley 44 © 2007 IBM Corporation
IBM – Informix Dynamic Server Direct I/O for cooked files © 2007 IBM Corporation
IBM Informix Dynamic Server Behavior of cooked files ► Cooked file performance can be much slower than raw devices. File System Cache 46 © 2007 IBM Corporation
IBM Informix Dynamic Server The Solution with Cooked files ► Direct I/O bypasses file system cache ► Unix and Linux OS support Direct I/O ► Performance close to that of raw devices File System Cache 47 © 2007 IBM Corporation
IBM Informix Dynamic Server When is DIO used ► DIO not used by default on cooked files ■ Onconfig DIRECT_IO = 1 to turn on ► When using DIO, kaio will be used by default. This can be switched off by setting KAIOOFF=1 48 © 2007 IBM Corporation
IBM Informix Dynamic Server What are the benefits of DIO? ► File reads/writes bypass the operating system read and write caches. ► Reducing CPU consumption and eliminating the overhead of copying data twice. ■ first between the disk and the file buffer cache ■ second from the file buffer cache to the application’s buffer. ► Can reduce number of AIO VPs if KAIOOFF is not set. 49 © 2007 IBM Corporation
IBM Informix Dynamic Server Limitations ► Can not be used for temporary dbspaces. ► can only be used for dbspace chunks whose file systems support direct I/O for the page size 50 © 2007 IBM Corporation
IBM Informix Dynamic Server What are customers saying ► "During the IDS “Cheetah” beta program we extensively tested IDS. . . Our major focus was the non-blocking checkpoints in "Cheetah" which will bring our customers additional performance boost. -- Wolfgang Kraus, Bytec Gmb. H, Head of ITServices ► We have seen enormous performance improvements—up to seven times faster in some cases—using IDS Cheetah. ” —Rob Prop, Manager Professional Services, Informa 51 © 2007 IBM Corporation
IBM Informix Dynamic Server Summary ► These are just some of the performance improvements that have been made in Cheetah 52 © 2007 IBM Corporation
IBM Informix Dynamic Server Questions 53 © 2007 IBM Corporation
- Slides: 53