Real Life RAC Performance Tuning Arup Nanda Lead
- Slides: 52
Real Life RAC Performance Tuning Arup Nanda Lead DBA Starwood Hotels
Who am I Oracle DBA for 13 years and counting l Working on OPS from 1999 l Speak at conferences, write papers, books l © Arup Nanda
Why This Session l I get emails like this: l We are facing performance issues in RAC. What should I do next? l Real Life Advice l Common Issues (with Wait Events) l Dispelling Myths l Formulate a Plan of Attack l Real Life Case Study l proligence. com/downloads. html © Arup Nanda
Our RAC Implementation l Oracle 10 g RAC in March 2004 l Itanium Platform running HP/UX l Oracle 10. 1. 0. 2 Result: Failed l Second Attempt: Dec 2004 – 10. 1. 0. 3 l Result: Failed Again l Third Attempt: March 2005 – 10. 1. 0. 4 l Result: Success! l © Arup Nanda
Challenges l Technology l Lone ranger l A lot of “mystery” and disconnected “facts”! l People l Building a team that could not only deliver; but also sustain the delivered parts l Each day we learned something new l In today’s session: real performance issues we faced and how we resolved them, along with wait events. © Arup Nanda
Why “RAC” Performance? All tuning concepts in single instance applied to RAC as well l RAC has other complexities l l More than 1 buffer cache l Multiple caches – library cache, row cache l Interconnect l Pinging l Global Locking © Arup Nanda
Why RAC Perf Tuning We want to make sure we identify the right problem and go after it l …. not just a problem l © Arup Nanda
Switch Public Interface VIP Service Cache Listener Cache Fusion Instance ASM Clusterware Op Sys Lock Manager OCR Voting Node 1 Instance ASM Clusterware Op Sys Node 2 Interconnect Switch Storage © Arup Nanda
Areas of Concern in RAC More than 1 buffer cache l Multiple caches – library cache, row cache l Interconnect l Global Locking l © Arup Nanda
Cache Issues Two Caches, requires synchronization l What that means: l l. A changed block in one instance, when requested by another, should be sent across via a “bridge” l This bridge is the Interconnect © Arup Nanda
Interconnect Performance Interconnect must be on a private LAN l Port aggregation to increase throughput l l APA l on HPUX If using Gigabit over Ethernet, use Jumbo Frames © Arup Nanda
Checking Interconnect Used l Identify the interconnect used $ oifcfg getif lan 902 172. 17. 1. 0 global cluster_interconnect lan 901 10. 28. 188. 0 global public l Is lan 902 the bonded interface? If not, then set it $ oifcfg setif … © Arup Nanda
Pop Quiz If I have a very fast interconnect, I can perform the same work in multiple node RAC as a single server with faster CPUs. True/False? l Since cache fusion is now write-write, a fast interconnect will compensate for a slower IO subsystem. True/False? l © Arup Nanda
Cache Coherence Times l The time is a sum of time for: l Finding the block in the cache C P l Identifying the master U l Get the block in the interconnect l Transfer speed of the interconnect Interconnect l Latency of the interconnect C l Receive the block by the remote instance P U l Create the consistent image for the user © Arup Nanda
So it all boils down to: l Block Access Cost l l l Lock Management Cost l l l more blocks -> more the time Parallel Query More coordination -> more time Implicit Cache Checks – Sequence Numbers Interconnect Cost l l l Latency Speed more data to transfer -> more the time © Arup Nanda
Hard Lessons In RAC, problem symptoms may not indicate the correct problem! l Example: l l When the CPU is too busy to receive or send packets via UDP, the packets fails and the Clusterware thinks the node is down and evicts it. © Arup Nanda
OS Troubleshooting l OS utilities to troubleshoot CPU issues l top l glance l OS Utilities to troubleshoot process issues: l truss l strace l dbx l pstack © Arup Nanda
Reducing Latency l l A factor of technology TCP is the most latent UDP is better (over Ethernet) Proprietary protocols are usually better l l Hyper. Fabric by HP Reliable Datagram (RDP) Direct Memory Channel Infiniband l l UDP over Infiniband RDP over Infiniband © Arup Nanda
Start with AWR © Arup Nanda
gc current|cr grant 2 -way gc current grant block 2 -way request Session Log Buffer LMS LGWR Instance 2 Instance 1 Database © Arup Nanda
gc current|cr block 2 -way gc gccurrentblock 2 -way request log file sync Session Log Buffer LMS LGWR Instance 2 Instance 1 Database © Arup Nanda
gc current|cr block 3 -way gc current block 3 -way Session Instance 2 Instance 1 Requestor Instance 3 © Arup Nanda Master Holder
RAC related Stats © Arup Nanda
RAC Stats contd. © Arup Nanda
© Arup Nanda
Other GC Block Waits l gc current/cr block lost l Lost l blocks due to Interconnect or CPU gc curent/cr block busy l The consistent read request was delayed, most likely an I/O bottleneck l gc current/cr block congested l Long run queues and/or paging due to memory deficiency. © Arup Nanda
Hung or Slow? l Check V$SESSION for WAIT_TIME l If l 0, then it’s not waiting; it’s hung When hung: l Take a systemstate dump from all nodes l Wait some time l Take another systemstate dump l Check change in values. If unchanged, then system is hung © Arup Nanda
Chart a Plan Rule out the obvious l Start with AWR Report l Start with Top-5 Waits l See if they have any significant waits l … especially RAC related l Go on to RAC Statistics l Base your solution based on the wait event l © Arup Nanda
Rule out the obvious Is interconnect private? l Is interconnect on UDP? l Do you see high CPU? l Do you see a lot of IO bottleneck? l How about memory? l Are the apps spread over evenly? l Do you see lost blocks? l © Arup Nanda
Make Simple Fixes Strongly consider RAID 0+1 l Highest possible number of I/O paths l Use fastest interconnect possible l Use private collision free domain for I/C l Cache and NOORDER sequences l © Arup Nanda
Enterprise Manager © Arup Nanda
Buffer Busy l Cause l Instance wants to bring something from disk to the buffer cache l Delay, due to space not available l Delay, b’coz the source buffer is not ready l Delay, I/O is slow l Delay, b’coz redo log is being flushed l In summary l Log buffer flush -> gc buffer busy © Arup Nanda
Parallel Query l One major issue in RAC is parallel query that goes across many nodes Via Interconnect Slave Instance 1 Slave QC © Arup Nanda Slave Instance 2
Restricting PQ l Define Instance Groups Specify in init. ora prodb 1. instance_groups='pqgroup 1' prodb 2. instance_groups='pqgroup 2' l Specify Instance Groups in Session SQL> alter session set parallel_instance_group = 'pqgroup 1'; © Arup Nanda
Forcing PQ on both Nodes l Define a common Instance Group prodb 1. instance_groups='pqgroup 1‘ prodb 1. instance_groups=‘pq 2 nodes' prodb 2. instance_groups='pqgroup 2' prodb 2. instance_groups='pq 2 nodes' l Specify Instance Groups in Session SQL> alter session set parallel_instance_group = 'pq 2 nodes'; © Arup Nanda
Vital Cache Fusion Views gv$cache_transfer: Monitor blocks transferred by object l gv$class_cache_transfer: Monitor block transfer by class l gv$file_cache_transfer: Monitor the blocks transferred per file l gv$temp_cache_transfer: Monitor the transfer of temporary tablespace blocks l © Arup Nanda
“Hot” Tables l Tables, e. g. Rate Plans l l l Symptoms l l Small Compact blocks High updates High reads gc buffer busy waits Solution l l l Less rows per block High PCTFREE, INITRANS, ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK © Arup Nanda
Hot Sequences l Symptoms: l High waits on Sequence Number latch l High waits on SEQ$ table l Solution: l Increase the cache l Make it NOORDER l Especially AUDSESS$ sequence in SYS, used in Auditing © Arup Nanda
Read Only? Say So. Reading table data from other instances create “gc *” contentions l Suggestion: l l Move Read Only tables to a single tablespace l Make this tablespace Read Only SQL> alter tablespace ROD read only; © Arup Nanda
Partitioning creates several segments for the same table (or index) l => more resources l => less contention l © Arup Nanda
Monotonically Increasing Index l Problem: l “Reservation ID”, a sequence generated key l Index is heavy on one side l Symptoms l Buffer busy waits l Index block spilts l Solutions: l Reverse key indexes l Hash partitioned index (even if the table is not partitioned) 10 g. R 2 © Arup Nanda
Library Cache In RAC, Library Cache is global l So, parsing cost is worse than non-RAC l Solutions: l l Minimize table alters, drops, creates, truncates l Use PL/SQL stored programs, not unnamed blocks © Arup Nanda
Log Files l l In 10 g R 2, the log files are in a single location: $CRS_HOME/log/<Host>/… Ê racg Ê crsd Ê cssd Ê evmd Ê client Ê cssd/oclsmon l $ORACLE_HOME/racg/dump © Arup Nanda
Case Study © Arup Nanda
Diagnosis ifconfig -a shows no congestion or dropped packets l Top shows 1% idle time on node 2 l Top processes l l LMS l and LMD And, several Netbackup processes © Arup Nanda
Further Diagnosis l SQL: select * from v$instance_cache_transfer where class = 'data block' and instance = 1; l Output: INSTANCE CLASS CR_BLOCK CR_BUSY ------------------CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED ----------------1 data block 162478682 5097149 477721 347917908 2950144 16320267 l After sometime: INSTANCE CLASS CR_BLOCK CR_BUSY ------------------CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED ----------------1 data block 162480580 5097185 477722 347923719 2950376 16320269 © Arup Nanda See increases
l Diagnosis: l CPU starvation by LMS/D processes caused GC waits. l Solution: l Killed the Netbackup processes l LMD and LMS got the CPU © Arup Nanda
Increasing Interconnect Speed l Faster Hardware l l l NIC settings l l l Duplex Mode Highest Top Bit Rate TCP Settings l l Gigabit Ethernet; not Fast Infiniband, even if IP over IB Flow Control Settings Network Interrupts for CPU Socket Receive Buffer LAN Planning l l Private LANs Collision Domains © Arup Nanda
High Speed Interconnects Oracle will support RDS over Infiniband l http: //oss. oracle. com/projects/rds/ l On 10 Gig Ethernet as well l © Arup Nanda
In summary: Planning Adequate CPU, Network, Memory l Sequences – cache, noorder l Tablespaces read only l Un-compact small hot tables l Keep undo and redo on fastest disks l Avoid full table scans of large tables l Avoid DDLs and unnamed PL/SQL blocks l © Arup Nanda
In summary: Diagnosis Start with AWR l Identify symptoms and assign causes l Don’t get fooled by “gc” waits as interconnect issues l Find the correlation between “dropped” packets in network, CPU issues from sar and “gc buffer lost” in sysstat reports. l © Arup Nanda
Thank You! Download from: proligence. com/downloads. html © Arup Nanda
- Arup nanda exadata
- Arup guha
- Arup guha rate my professor
- Arup guha
- Arup guha
- Arup
- Arup
- Thomson tuning
- Sql server 2005 performance monitor
- Mainframe tuning
- Oracle tuning tutorial
- Glusterfs tuning
- Apache performance tuning
- Abap performance tuning
- Moodle
- Frog sql
- Ssas performance tuning
- Sshfs vs nfs
- Maximo performance tuning
- Apache performance tuning
- Mysql huge pages
- Database performance tuning and query optimization
- Solarwinds performance tuning
- Harrison performance and tuning
- Jörg stryk
- Data warehouse performance tuning
- Terminal server performance tuning
- Walker performance tuning
- Informix performance tuning
- Frequency dependence of dielectric constant
- Examples of real life problems involving polynomials
- Tìm lớp tương đương
- What is the sacred rac
- Codul rac
- Rac assessment
- Rac exam questions
- Rac charging stations
- Rac 101
- Rac periodic maintenance inspection
- Oracle rac
- Logic vị từ là gì
- Thung rac
- Rac timeline
- Rac theory
- Bridge rac
- Rac prototype
- Ash report for rac
- Rac servicing
- Oci visio stencils
- Oracle sharding
- Mc rac
- El raco fib
- Acesso central femoral