Oracle 9 i RAC By Ramesh Malayappan Gautam
Oracle 9 i RAC By Ramesh Malayappan & Gautam Mekala Dell Confidential
Overview • Overview of OS (Linux) • Overview of Oracle 9 i Real Application Clusters • Oracle 9 i RAC on Linux • Tuning Tips • Issues to deal in RAC • Going Forward (Oracle 10 G) • Q&A Dell Confidential 2
Benefits of Real Application Clusters • • Dell Confidential New Shared Cache Architecture Exploits New Hardware and Software Technologies Most Flexible Clustering technology Provides scalability and high availability 3
Real Application Clusters • Real Application Clusters (RAC) – – Cache Fusion True scalability • Transparent Scalability – – All Applications Scale – No tuning required – ISV Applications Scale out of the box No Physical Data Partitioning required (Application user partitioning is needed though) • High Availability – Loss of single node on cluster will not stop the database • Ability to additional hardware transparently to users Dell Confidential 4
Overview of Oracle 9 i RAC • Many instances of Oracle running on many nodes • All instances share a single physical database and have common data & control files • Each instance has its own log files and rollback segments • All instances can simultaneously execute transactions against the single database • Caches are synchronized using Oracle’s Global Cache Management technology (Cache Fusion) • No Single Point of Failure (Server side) Dell Confidential 5
Oracle 9 i RAC on Linux • Clustering consists of 2 Oracle-supplied components – Cluster Manager (oracm) • • • – Accepts registration of Oracle instances Responsible for process level cluster status Hangcheck-timer • • Dell Confidential Provides consistent view of Oracle instances New in 9. 2. 0. 2, replaces watchdogd Monitors the Linux kernel for system hangs Implemented as a kernel module so it much less affected by system load Resets node from within kernel if abnormal hangs occur 6
Internal Workings Of RAC. . • Multi-Instance with Single Database • Cache Fusion (aggregation of cache from each node) • • Inter-Instance Transfers GES and GCS Resources Co-ordination SGA Ownership (conversions) GES Status and Roles SGA Cache Fusion GCS RAC Processes Fail-over Recovery database Dell Confidential 7 GES
Contents of SGA Shared Pool • The shared pool portion of the SGA – – – Buffer Cache Redo Buffer Large Pool Data Buffers Dict. Cache Library Cache PGA Dell Confidential • Library Cache: – Shared Across Instances – Synchronized Across Instances Library Cache Dictionary cache Buffers for parallel exec mesg and control structures. Shared SQL areas, private SQL areas (MTS), PL/SQL procedures and packages, and control structures such as locks and library cache handles. • Data Dictionary: – Remains Local to each Instance 8 Collection of database tables and views containing reference information about the database, its structures, and its users.
Cache Fusion • Cache Fusion is a fundamental component of Real Application Cluster • Cache Fusion allows individual nodes to share the contents of their buffer caches through the inter-connect cluster Interprocess Communication (IPC) eliminating the need for extra disk I/Os. • This greatly improves the performance and scalability characteristics of shared-disk clusters • Cache fusion only works with the default resource control scheme. If GC_FILES_TO_LOCKS is set, the old pre-cache fusion behavior is utilized. In other words, forced disk-writes will be used. Dell Confidential 9
Dirty Blocks & Past Image • In a non-RAC instance – – – User A selects say 10 rows (10 blocks) User B selects same 10 rows (10 blocks) User B updates those 10 rows (10 blocks) • • Dirtied blocks • Rollback segment buffer provide read consistent image Not committed User C selects same 10 rows Now User B performs Commit Now User D updates same blocks • gets the same dirtied blocks • If it is a RAC – Dell Confidential When user D updates on second Instance, PAST Image is created for those blocks sent out 10
Cache Coherency Lock Management • Transfer of blocks among the individual node cache’s • Global Concurrency of the data blocks / pages • Global Control mechanism • Cluster Interconnects – Connect nodes • • – Can be a specialized cables with Hub/Switch Functions • • Dell Confidential Can be a simple private network connection Monitors Health, Status of nodes, Accessing remote file systems Cluster alias routing Application-specific traffic Distributed lock manager (DLM) messages / GCS messages 11
Cluster Interconnects • Essential Requirements – – – Low latency for short messages High speed and sustained data rates for large messages; Low Host-CPU utilization per message. Flow Control, Error Control and Heart-beat Continuity monitoring Host Interfaces to interact directly with host processes (‘OS bypass’) Switch Networks that scale well Measurement Typical SMB Bus Memory Channel Myrinet SCI Giga Ether Latency ( µs ) 0. 5 3 7 to 9 9 100 CPU overhead (µs) < 1 < 1 Messages per sec (millions) > 10 > 2 Hardware Bandwidth (MB/sec) > 500 > 100 ~ 250 ~ 50 Dell Confidential 12
Resources and Coordination Data Blocks - Global Cache Resources Enqueues - Global Enqueue Resources Local or Independent Resources • Synchronization: • Data Blocks and Enqueues • Nodes acquire and release ownership of resources • Co-ordination of concurrent tasks within shared cache Enqueue is a shared memory structure Serializes access to database resources Associated with a session or transaction. . E. g. Update to a row Local Concurrency Controls Latches, Row Locks, Local Enqueues Dell Confidential 13
Resource Coordination Resources have § Roles : Locally Managed and Globally Managed § Modes : Null , Shared, Exclusive Most important Resource : DATA BLOCK Past Image Global Resource Directory When a dirty block is sent to other node using CF, it keeps a copy (data integrity in case of failures) § Data Block Identifiers - DBA § Location of most current status § Modes of Data Blocks § Roles of the Blocks Consistent Record (CR) Consistent snapshot at a previous point in time Dell Confidential 14
Resource Modes and Roles When referring to a lock mode in RAC, there are three characters to distinguish E. g. ABC A = Represents lock mode with values Null, Shared, Exclusive B = Represents Lock Role, : Local, Global C = Shows if Past Image exists or not ; (1) PI exists , (0) No PI exists NL 0 SL 0 XL 0 Null Local and No past Images Shared Local with no past image Exclusive Local with no past image NG 0 SG 0 XG 0 Null Global - Instance owns current block image Global Shared Lock - Instance owns current image Global Exclusive Lock - Instance own current image NG 1 SG 1 XG 1 Global Null - Instance Owns the Past mage Block. Shared Global - Instance owns past Image Global Exclusive Lock - Instance owns Past Image. Dell Confidential 15
Global Enqueue Service Controls Library Cache Locks during parsing of SQL, DML, DDL, PL/SQL Controls Data Dictionary Cache (Table Locks etc) Manages synchronization through latches Handles the message between instances (for changes) Oracle Processes • LMON : Monitors the enqueues and resources • LMD : Lock agent process • GSD : Diagnosability Daemon • LCK : manages global eqnueue requests • LMSn : GCS processes - handles blocking interrupts from the remote instance, cross instance calls Usual Process like SMON, PMON, LGWR, CKPT, DBWR etc Dell Confidential 16
Failover Basics • Detection of failure, by way of its LMON process • One of the Instances (Recovering Instance) controls the recovery of the failed instance by taking over its redo log files. • All in-progress transactions are rolled back (transaction recovery) • Instance recovery does not include restarting the failed instance • Only the resources mastered by GSC are re-built • SMON process of a surviving Instance performs recovery of failed instance Dell Confidential 17
Fusion Recovery • Recovery – – – The instance, or instances dies – SMON claims locks needed to recover blocks found by the first pass read. – Locks are obtained and second pass of redo theads of failed instances is performed and blocks become available as they have been recovered. – Predecessor blocks can be in past image block in a different instance or on disk. Dell Confidential Failure detected by cluster manager or GCS. Reconfiguration occurs and all locks owned by the departed instance are remastered and the first pass read of threads of failed instances done by SMON 18
Client Connectivity – Server Fail • Add failover options manually to TNS configuration files • They are part of the CONNECT_DATA section of a connect descriptor • Failover options include – – – Dell Confidential TYPE: Identify the nature of TAF, if any METHOD: Configure how quickly failover can occur BACKUP: Identify an alternate net service name RETRIES: Limit the number of times a reconnection will be attempted DELAY: Specify how long to wait between reconnection attempts 19
Oracle 9 i RAC on Linux (cont. ) Install Flowchart Verification of Hardware and Software Configure Kernel Parameters Configure & Start Cluster Manager Create DBA group and Oracle Account Enable “rsh” & “rcp” on each node Install Oracle 9 i RAC Option Configure Network Remove IBM Java Package Start GSD & Configure Listener Configure Storage Install Cluster Manager Create database Dell Confidential 20
Linux kernel parameters • Set /proc/sys/kernel/shmmax to 3 GB • Using multiple DBWRs with async I/O is usually better than using I/O slaves • Must re-link to use libaio i. e. ASYNC I/O – – – make -f ins_rdbms. mk async_on init. ora: disk_asynch_io=true by default init. ora: filesystemio_options=asynch set this as well if datafiles are on a filesystem (e. g. ext 2) • 2 DBWRs is a good default for a large buffer cache areas • If large read sizes occur, increase /proc/sys/fs/aio-max-size to the largest read size (default is 128 KB) Dell Confidential 21
Larger Buffer Cache • Oracle has the capability to use an extended buffer cache greater than 4 GB • Using Indirect Data Buffers has some overhead, so use this option only if you have enough RAM to create a buffer cache greater than 4 GB • Steps to enable Indirect Data Buffers (from Oracle 9 i Administrators Reference, Rel 2 for Linux): – – – mount -t shm -o size=8 g shmfs /dev/shm (can put this in /etc/fstab) init. ora: use_indirect_data_buffers=true init. ora: use only db_block_buffers and db_block_size (no db_cache_size) • For OLTP Apps, small blocks (e. g. 2 KB) typically work better Dell Confidential 22
Increasing Address Space • Oracle defaults to use about 1. 7 GB of address space for its SGA • It’s possible to increase the SGA address space to about 2. 6 GB (Note 200266. 1) – – Dell Confidential genksms -s 0 x 15000000 >ksms. s make -f ins_rdbms. mk ksms. o make -f ins_rdbms. mk ioracle echo 268435456 >/proc/<pid>/mapped_base (as root), where <pid> is the pid of the session running SQL*Plus 23
Increasing Address Space (cont. ) After Relink Default 0 x. FFFFFFFF Reserved for kernel 0 x. C 0000000 Variable SGA DB Buffers (SGA) 0 x 50000000 0 x 40000000 Variable SGA sga_base (relink Oracle) mapped_base (/proc/<pid>/mapped_base) Code, etc. 0 x 15000000 0 x 10000000 0 x 00000000 Dell Confidential DB Buffers (SGA) 24 Code, etc.
Bigpages • It is a feature in Red Hat Advance Server that provides applications access to large memory pages on Intel 32 -bit CPUs • The default memory page size is 4 KB. • Requires OS support to enable • Large pages used for the SGA reduces the number of page table entries that Linux and the CPU need to keep track of • Reduces the CPU’s Translation Look-aside Buffer (TLB) miss rate • Bigpage settings – – /proc/sys/kernel/shm-use-bigpages=0 : bigpage pool is not used – /proc/sys/kernel/shm-use-bigpages=2 : same as 1, but memory is also useable in the Indirect Data Buffers case Dell Confidential /proc/sys/kernel/shm-use-bigpages=1 : bigpage memory is useable by Oracle except in the Indirect Data Buffers case 25
Real Cluster Design Issues Result Component Effect of Failure Ok CPU panic / crash Node Failed, other node still active Ok Memory crash Node Failed, other node still active Ok Interconnect With dual Interconnects, OK Down Interconnect Switch Nodes can not communicate Ok OS failure / freeze Node Failed, other node still active Down Cluster Manager s/w Custer freezes, all nodes go down Ok DB Instance Crash Instance running on other node provides database service Ok Control File (Corrupt / Lost) Multiplexed control file will be used Ok Redo log file Multiplexed redo file Down Lost Data File Requires Media recovery Down Human Error Depends on type of mistake Down Dropped Object DB is available but applications stall Down DB software bug DB may stall on all instances. Dell Confidential 26
Performance Monitoring • There are many views that help to monitor the inter-instance transfers and RAC performance • v$class_cache_transfer, v$cache, v$lock_activity, v$ges_statistics, v$bh , v$sysstat and V$SYSTEM_EVENT • The above views help diagnose the following issues – – – The most significant statistics are in v$sysstat Cache-related statistics such as consistent gets, db block gets, and db block changes Cache Fusion related statistics, such as global cache current block receive time or global cache current block send time, global cache lock open Convert requests, and global cache wait times, such as global cache gets, global cache converts, and waits for events such as Null-to-X conversions I/O statistics such as physical reads, physical writes, DBWR cross-instance writes, and wait times for reads and writes Dell Confidential 27
Oracle Parallel Execution. . • RAC can engage multiple processors from different nodes for a given task execution • Achieve additional parallelism, not possible by a single SMP node. – For instance, in a two node ORAC, set up a parallel query with ‘Parallel Hint’ to utilize the CPUs from the both the instances. • SELECT /*+ FULL(nydata) PARALLEL(nydata, 3, 2) / count(*) FROM nysales; • In this example, Degree of Parallelism (DOP) is 3 and use Two instances. It is executed with total 6 processes, 3 on each instance Dell Confidential 28
Issues Faced • There were multiple issues since we started to work on RAC. Many of them have been resolved through upgrades and minor patches. • Major issues – NTP Issues-: Problem appears to have been due to NTP (Network Time Protocol) settings on the server that allowed the time to be automatically set backwards by the NTP server. This time change caused it to look like a checkin had been missed. Changing NTP settings so that setting the time backwards is disallowed appears to have resolved the problem – Split-brain condition -: This should never happen but we ran into this issues also. The Cluster software should take care of this issue. – Fork-Process Hanging -: we started seeing “unable to receive acknowledgement from forked process” in alert log Dell Confidential 29
10 G RAC Features • Dynamic affinity policy enhancements for optimizing the Cache Fusion protocol to enhance the performance of several kinds of workloads • Better Workload management • Improvements to adding a node. Oracle introduces portable cluster-ware that makes adding a node easier. • Cluster application availability subsystem Dell Confidential 30
Q & A Dell Confidential 31
- Slides: 31