Is Your HaDr READY or FULL HaDr A

  • Slides: 56
Download presentation
Is Your H(a)D(r) READY or FULL H(a)D(r) ? A few words about DB 2

Is Your H(a)D(r) READY or FULL H(a)D(r) ? A few words about DB 2 HADR performance. Gulbas Rafał DECSOFT S. A.

Agenda • • Overwiev Configuration parameters Features Monitoring tools Performance What to look at

Agenda • • Overwiev Configuration parameters Features Monitoring tools Performance What to look at Simulation

OVERWIEV

OVERWIEV

HADR definition • HADR (High Availability Disaster Recovery) is a DB 2 LUW feature

HADR definition • HADR (High Availability Disaster Recovery) is a DB 2 LUW feature for high availability and disaster recovery. It is the preferred solution for HA and DR on DB 2. It replicates data in real time from a primary database to one or more standby databases PRIMARY DATABASE LOGS over TCP/IP STANDBY DATABASE

Overwiev HADR d t n e m p evelo V 10. 5 fp 4

Overwiev HADR d t n e m p evelo V 10. 5 fp 4 Columnar data support V 10. 5 Enhanced Monitoring , pure. Scale support V 10. 1 Delayed Replay, Log Spooling , Multiple Standby, Enhanced Monitoring and configuration parameters V 11. 1 SSL Support, Streamline Upgrade

Architecture Log pages TCP connection db 2 hadrp db 2 hadrs Log pages Log

Architecture Log pages TCP connection db 2 hadrp db 2 hadrs Log pages Log records db 2 shred Log pages db 2 agent db 2 redom Log records Log pages Log records db 2 loggw db 2 lfr Write log pages Read log pages Log files db 2 redow

Synchronization modes • Synchronous • Zero data loss • Waits for the log disk

Synchronization modes • Synchronous • Zero data loss • Waits for the log disk write on the standby • Near synchronous • Waits for log data to be received in log buffer in the standby • Asynchronous • Waits for log data to be sent to the standby • SUPERASYNC (super asynchronous) • Primary can never be blocked in any case

Requirements • OS should be the same version including patches. If rolling update happening,

Requirements • OS should be the same version including patches. If rolling update happening, you can violate this rules for a short time • DB 2 version and the level must be identical on both servers, including the bit size. (32 bit or 64 bit) • Network interface must be available on both server • Bufferpool size should be the same • The database name should be identical, that means they must be in different instances if they reside in a same server • Tablespace must be identical • The log file space also should be the same on both server • The system clock must be synchronized on both servers

Features advantages • • • Read on standby (supported db 2 v 9. 7.

Features advantages • • • Read on standby (supported db 2 v 9. 7. 1 and above) Automatic client reroute Rolling update without any downtime for running applications Streamline Upgrade 11. 1 Delayed reply, new feature in 10. 1 Log Spooling, new feature in 10. 1 Multiple standby, new feature in 10. 1 SSL support 11. 1 Pure. Scale support 10. 5

Drawbacks • • • Backup operations are not supported on standby databases Not support

Drawbacks • • • Backup operations are not supported on standby databases Not support infinite logging Not support no logging transaction Not support for DPF Not support in DB 2 pure. Scale environments Supported > V 10. 5 Not support for columnar data aka shadow tables Supported > V 10. 5 fp 4

CONFIGURATION PARAMETERS

CONFIGURATION PARAMETERS

DB 2 database configuration • • • HADR_LOCAL_HOST HADR_LOCAL_SVC HADR_REMOTE_HOST HADR_REMOTE_SVC HADR_REMOTE_INST • HADR_SYNCMODE

DB 2 database configuration • • • HADR_LOCAL_HOST HADR_LOCAL_SVC HADR_REMOTE_HOST HADR_REMOTE_SVC HADR_REMOTE_INST • HADR_SYNCMODE • LOGINDEXBUILD • INDEXREC • • • HADR_TIMEOUT HADR_TARGET_LIST HADR_SPOOL_LIMIT HADR_REPLAY_DELAY HADR_PEER_WINDOW HADR_SSL_LABEL

DB 2 Registry • • • DB 2_HADR_ROS DB 2_HADR_BUF_SIZE DB 2_HADR_NO_IP_CHECK DB 2_HADR_PEER_WAIT_LIMIT

DB 2 Registry • • • DB 2_HADR_ROS DB 2_HADR_BUF_SIZE DB 2_HADR_NO_IP_CHECK DB 2_HADR_PEER_WAIT_LIMIT DB 2_HADR_SOSNDBUF DB 2_HADR_SORCVBUF DB 2_LOAD_COPY_NO_OVERRIDE BLOCKNONLOGGED DB 2_STANDBY_ISO

How long can we wait • HADR_PEER_WINDOW • Number of seconds primary will block

How long can we wait • HADR_PEER_WINDOW • Number of seconds primary will block log writing after connection lost • Allows failover without data loss • ASYNC mode never enters this state • HADR_TIMEOUT • When connection failure occurs • HADR can detect network failure before HADR_TIMEOUT • DB 2_HADR_PEER_WAIT_LIMIT • How long can logging on primary be blocked • Default is 0 meaning no limit

FEATURES

FEATURES

HADR in Pure. Scale • DB 2 pure. Scale is an optional DB 2

HADR in Pure. Scale • DB 2 pure. Scale is an optional DB 2 feature • Data Sharing Cluster configuration • Multiple database servers • DB 2 sharing data and logs • Continuous availability and disaster recovery • Deliver uninterrupted access to data • Online recover, workload balance and stealth maintenance • High and flexible capacity • Transparent for application

HADR restrictions in Pure. Scale • The HADR synchronization modes SYNC and NEARSYNC are

HADR restrictions in Pure. Scale • The HADR synchronization modes SYNC and NEARSYNC are not supported. You must specify either the ASYNC or SUPERASYNC option for the hadr_syncmode configuration parameter. Not in V 11. 1 • A peer window is not supported. The hadr_peer_window configuration parameter must be set to 0. • You cannot have more than one HADR standby database. • The topology of the primary and the standby must be synchronized. • The reads on standby feature is not supported. • You cannot use the integrated cluster manager to manage automated failover; it manages high availability within the local cluster only. • Network address translation (NAT) between the primary and standby sites is not supported.

Rolling updates • Enable to update database with minimal downtime or zero downtime •

Rolling updates • Enable to update database with minimal downtime or zero downtime • Not supported for SUPERASYNC mode • Not supported upgrade from major version to a higher major version • Procedure • • • Check the HADR status Apply fix pack in standby Switchover roles Apply fix pack in original primary Switchover HADR roles back to the original primary Post-installation tasks in primary

Streamline upgrade 11. 1 • HADR environments can be upgraded without the need to

Streamline upgrade 11. 1 • HADR environments can be upgraded without the need to re-initialize the standby database after performing an upgrade on the primary database • Only applicable to databases at DB 2 V 10. 5 Fix Pack 7 or later and in a single partition or pure. Scale configuration. • New information in log record: • Database Migration Begin Log Record: written to mark the start of a database upgrade. • Database Migration End Log Record: written to mark successful completion of a database upgrade.

Streamline upgrade 11. 1 • Procedure • • • Stop the primary database instance

Streamline upgrade 11. 1 • Procedure • • • Stop the primary database instance Upgrade the primary DB 2 instance After log shipping is completed stop standby database Stop standby instance Upgrade standby instance Start standby instance and database upgrade process • Replay functionality begins in the background and waits for upgrade log data to be received from the primary • Start primary database upgrade process • Primary database looks to connect to the standby and database upgrade process begins • Perform recommended post-upgrade steps

ACR • Allows a DB 2 client application to recover from a lost database

ACR • Allows a DB 2 client application to recover from a lost database connection in case of a network failure • Best practice: combined with a virtual IP address • Allows clients to connect to the primary database via VIPA • Identify the IP address that will be created and failed over as part of the HADR failover • Identify the port number for the instance TCP/IP listener via checking the value of the SVCENAME DBM CFG parameter • Make sure that the TCP/IP listener is started in both instances • SVCENAME parameter is set to a consistent unused value for the TCP/IP listener ports on both instances

ACR • Cannot be used with ROS feature • Will receive error when takeover

ACR • Cannot be used with ROS feature • Will receive error when takeover is in process

ROS • • Allows a DB 2 client to access read only data from

ROS • • Allows a DB 2 client to access read only data from standby database Enabled by setting DB 2 registry parameter DB 2_HADR_ROS Not allowed with Automatic Client Reroute Some restrictions on DDL operations. • While replaying DDL enters replay_only_window mode • Unable to connect to standby database while in replay only mode

ROS • ROS supports only UR isolation level • If we not specify then

ROS • ROS supports only UR isolation level • If we not specify then DB 2 uses CS isolation level • We can use db 2 registry variable to set default isolation level on UR DB 2_STANDBY_ISO=UR

Multiple standbys • HADR supports multiple standby databases • Use HADR_TARGET_LIST parameters to specify

Multiple standbys • HADR supports multiple standby databases • Use HADR_TARGET_LIST parameters to specify a list of standby • Eliminate need for replication solution • Up to three standbys are supported: • One principal standby database • Up to two auxiliary standby databases • SA MP only for Principal Standby PRIMARY DATABASE Auxillary Standby NC Y S A ER P SU ASYNC SUPER Principal Standby Auxillary Standby

Delayed Replay • Helps prevent data loss due to errant transactions • The data

Delayed Replay • Helps prevent data loss due to errant transactions • The data is protected by specifying a delay for applying changes on the standby database • Intentionally keep the standby database at a point in time that is earlier than that of the primary database • transactions are not committed on the standby database until the replay delay has elapsed • Controlled by hadr_replay_delay

Delayed replay Why only one record is visible ? On standby after 2 sec

Delayed replay Why only one record is visible ? On standby after 2 sec Why record 5 is visible immediate ?

HADR SSL Support • With 10. 5 and above we got IBM DB 2

HADR SSL Support • With 10. 5 and above we got IBM DB 2 Encryption Offering • With 11. 1 we get the ability to encrypt connection between HADR servers • Each HADR node need his own certificate • If a certificate is self signed then there is need to cross certify • If a certificate is provided by a CA we just need to import CA certificate

HADR SSL Support

HADR SSL Support

MONITORING TOOLS

MONITORING TOOLS

Monitoring tools • Using database snapshots – not supporting multiple standby • db 2

Monitoring tools • Using database snapshots – not supporting multiple standby • db 2 get snapshot for database on sample |grep -i hadr -A 14

Monitoring tools • Using administrative view – not supporting multiple standby • db 2

Monitoring tools • Using administrative view – not supporting multiple standby • db 2 "select substr(DB_NAME, 1, 10) as DBNAME, substr(HADR_CONNECT_STATUS, 1, 10) as connect_stat, substr(HADR_ROLE, 1, 10) as ROLE, substr(HADR_STATE, 1, 10) as state, substr(HADR_SYNCMODE, 1, 10) as syncmode, substr(HADR_HEARTBEAT, 1, 10) as heartbeat, substr(HADR_LOCAL_HOST, 1, 10) as local, substr(HADR_REMOTE_HOST, 1, 10) as remote, SNAPSHOT_TIMESTAMP from sysibmadm. snaphadr"

Monitoring tools • Using mon_get_hadr table function • db 2 "SELECT HADR_ROLE, STANDBY_ID, HADR_STATE,

Monitoring tools • Using mon_get_hadr table function • db 2 "SELECT HADR_ROLE, STANDBY_ID, HADR_STATE, varchar(PRIMARY_MEMBER_HOST , 20) as PRIMARY_MEMBER_HOST, varchar(STANDBY_MEMBER_HOST , 20) as STANDBY_MEMBER_HOST from table(MON_GET_HADR(NULL))„ • Using db 2 problem determination tool • db 2 pd –hadr –db <database_name>

Monitoring tools V 10. 5 V 11. 1

Monitoring tools V 10. 5 V 11. 1

 • Using db 2 top A option

• Using db 2 top A option

PERFORMANCE

PERFORMANCE

Performance of HADR • COMMIT on the HADR primary may be delayed • Depending

Performance of HADR • COMMIT on the HADR primary may be delayed • Depending of synchronization mode • If the connection is lost, a transaction may experience a commit delay • Depending of values at HADR_TIMEOUT and HADR_PEER_WINDOW • Database performance after a takeover may be delayed • Network requirements depends of logging rate

Security or performance ? Performance SUPERASYNC NEAR SYNC Security

Security or performance ? Performance SUPERASYNC NEAR SYNC Security

Some HADR tricks • Large log gap • Manually copying of log files •

Some HADR tricks • Large log gap • Manually copying of log files • Really heavy workload in some hours • Temporarily changing HADR Synchronization Mode • Not blocking logging on primary for too long • Set HADR peer wait limit

WHAT TO LOOK AT

WHAT TO LOOK AT

State of HADR • HADR_ROLE • Primary • Standby • Standard • HADR_STATE •

State of HADR • HADR_ROLE • Primary • Standby • Standard • HADR_STATE • • Local catchup Remote catchup pending Remote catchup Peer • HADR_CONNECT_STATUS • Connected • Disconnected • Congested

Overhead on logging • IMPORTANT! MON_GET_TRANSACTION_LOG, SNAP_GET_DB and db 2 get snapshot for database

Overhead on logging • IMPORTANT! MON_GET_TRANSACTION_LOG, SNAP_GET_DB and db 2 get snapshot for database DOES NOT INCLUDE HADR OVERHEAD • Since 10. 1 it is easier to monitor overhead: • • LOG_HADR_WAIT_CUR LOG_HADR_WAIT_ACCUMULATED LOG_HADR_WAIT_COUNT LOG_HADR_WAIT_RECENT_AVG • Only in db 2 pd • Computed from: Delta(LOG_HADR_WAIT_ACCUMULATED) / Delta(LOG_HADR_WAIT_COUNT)

Log replay speed • • PRIMARY_LOG_POS STANDBY_REPLAY_LOG_POS < 10. 1 Can compute logging rate

Log replay speed • • PRIMARY_LOG_POS STANDBY_REPLAY_LOG_POS < 10. 1 Can compute logging rate of HADR databases using POS. • Delta(PRIMARY_LOG_POS) / Delta (TIME) = Primary logging rate • HADR_LOG_GAP = PRIMARY_LOG_POS - STANDBY_LOG_POS • STANDBY_RECV_REPLAY_GAP = STANDBY_LOG_POS - STANDBY_REPLAY_POS < 10. 1

Buffers and spoolers • STANDBY_RECV_BUF_PERCENT • 100% used will cause standby log receive to

Buffers and spoolers • STANDBY_RECV_BUF_PERCENT • 100% used will cause standby log receive to be blocked (if spooling is disable ) • Configured by registry variable DB 2_HADR_BUF_SIZE • STANDBY_SPOOL_PERCENT • Configured by database variable hadr_spool_limit • If unlimited then NULL will be returned • Can compute spool space: (STANDBY_LOG_POS - STANDBY_REPLAY_LOG_POS) / 4096 (STANDBY_RECV_BUF_SIZE * STANDBY_RECV_BUF_PERCENT)

HADR flags • HADR_FLAGS • • • ASSISTED_REMOTE_CATCHUP ASSISTED_MEMBER_ACTIVE STANDBY_REPLAY_NOT_ON_PREFERRED STANDBY_LOG_RETRIEVAL STANDBY_RECV_BLOCKED STANDBY_LOG_DEVICE_FULL STANDBY_KEY_ROTATION_ERROR

HADR flags • HADR_FLAGS • • • ASSISTED_REMOTE_CATCHUP ASSISTED_MEMBER_ACTIVE STANDBY_REPLAY_NOT_ON_PREFERRED STANDBY_LOG_RETRIEVAL STANDBY_RECV_BLOCKED STANDBY_LOG_DEVICE_FULL STANDBY_KEY_ROTATION_ERROR STANDBY_TABLESPACE_ERROR STANDBY_UPGRADE_IN_PROGRESS V 10. 5 V 11. 1

HADR SIMULATION

HADR SIMULATION

Simulator of HADR • Without need of DB 2 system it can: • Estimate

Simulator of HADR • Without need of DB 2 system it can: • Estimate HADR performance under different sync choices • Measure disk performance • Measure network performance and tuning options • Capabilities • Simulates single standby systems • Simulates multiple standbys • Simulates pure. Scale systems

Where to get it? • https: //www. ibm. com/developerworks/community/files/app? lang=en#/search? searc h=db 2 logscan

Where to get it? • https: //www. ibm. com/developerworks/community/files/app? lang=en#/search? searc h=db 2 logscan • https: //www. ibm. com/developerworks/community/files/app? lang=en#/search? searc h=simhadr

Available tools • HADR Simulator • Simulate HADR connection, test network speed and disks

Available tools • HADR Simulator • Simulate HADR connection, test network speed and disks subsystem • DB 2 Log Scanner • Check workload of your database from logfiles • HADR Calculator • Calculate HADR overhead for difference synchronization modes

Analysis of the disk subsystem • Drive write speed analyze • Write_time = per_write_overhead

Analysis of the disk subsystem • Drive write speed analyze • Write_time = per_write_overhead + data_amount/transfer_rate • Write overhead: • Tramsfer rate:

Check your workload • Analyze provided log files • db 2 logscan [option] <log_file_1>

Check your workload • Analyze provided log files • db 2 logscan [option] <log_file_1> …. <log_file_N> > log_scan. out • Options: -interval , -checksum, -info. Level, -trans. Level, -f <file> , -retrive <cmd> Run „<cdm> file_name” Not working for log files from > 10. 5 fp 7

Throughput = buffer size / latency Analysis of the WAN • Two parameters in

Throughput = buffer size / latency Analysis of the WAN • Two parameters in db 2 register DB 2 - DB 2_HADR_SOSNDBUF , DB 2_HADR_SORCVBUF • • Starts from 64 KB and increase till bandwidth stop increasing Tuning the buffer size will have no effect on networks that are 10 Mbps or less Throughput = buffer size / latency Buffer size = RTT * Bandwidtch • TCP window size • • • Default OS system setting may not be optimal Formula - (best) send. Rate * (worst) round. Trip. Time Small socket buffer size may not utilize full network bandwidth TCP Window Scaling (RFC 1323) If needed we change the socket buffer size for HADR system

Example of run

Example of run

Calculate overhead

Calculate overhead

QUESTIONS ? ? ?

QUESTIONS ? ? ?

THANKS! RAFAŁ GULBAS rafal@dba-life. com

THANKS! RAFAŁ GULBAS rafal@dba-life. com