EMC Recover PointCluster Enabler for Microsoft Failover Cluster
EMC Recover. Point/Cluster Enabler for Microsoft Failover Cluster © Copyright 2010 EMC Corporation. All rights reserved. 1
Disaster Restart Business challenges and requirements Meeting recovery point objective (RPO) and recovery time objective (RTO) requirements with current plan – Business and/or regulatory needs – Need to reduce RPO and RTO times § Application benchmarks from days or hours to minutes or seconds – Need for continuous operations with no data loss Cost of assets and maintenance at disaster recovery site – Need to maintain software versions (updates and patches) Reliability of the disaster recovery plan – Need to address applications requiring dependent write consistency between and across operating systems – Need to periodically test to ensure it will work when required © Copyright 2010 EMC Corporation. All rights reserved. 2
Business Impact of Application and Data Inaccessibility Hot site/cold site Downtime Cost Electronic vaulting Database replication Remote replication Dedicated hot standby Geographical clusters Time Cost of downtime escalates quickly over time © Copyright 2010 EMC Corporation. All rights reserved. 3
Microsoft Failover Cluster A high-availability restart solution Node or resource failure automatically restarts failed nodes on another node where resources are available Node Fails Resource Group: Microsoft SQL Resource Group: Microsoft Exchange Resource Group: Oracle Microsoft failover cluster provides high availability; shared-nothing cluster model © Copyright 2010 EMC Corporation. All rights reserved. 4
Cluster Enabler 4. 0 Features and capabilities overview Integrates Recover. Point and Recover. Point/SE with Microsoft failover cluster – Automatic site failover for remote replication operations – Supports majority node set quorum options: Majority Node Set (MNS), and MNS with File Share Witness Supports Recover. Point continuous remote replication (CRR) Cluster Enabler 4. 0 supports Recover. Point 3. 1. 1 or later and any array supported by Recover. Point and Recover. Point/SE – Using Fibre Channel or Gigabit Ethernet for remote replication – Up to 400 milliseconds maximum latency for asynchronous replication – Up to 4 milliseconds maximum latency for synchronous replication Supports Windows Server and Server Core for Windows Server 2008 – Up to two nodes per site with Windows Server 2003 – Up to eight nodes per site with Windows Server 2008 and Windows Server 2008 R 2 – Supports clustering of up to eight child partitions with Hyper-V © Copyright 2010 EMC Corporation. All rights reserved. 5
Majority Node Set Support Majority Node Set Used as a tie-breaker to avoid split-brain scenarios From a cluster-node perspective, each node sees the quorum as a local resource – Each cluster node stores the configuration information on a local disk § Each node has access to local disk when it starts up – Cluster service ensures cluster configuration is consistent on each cluster node § Changes are replicated across the Majority Node Set File Share Witness External to an cluster providing an additional quorum vote – 2 - to 4 -node cluster can survive up to N-1 node failures – 4 - to 8 -node cluster can survive up to N-2 node failures Acts as a witness to Majority Node Set – Enhances geographically disbursed failover cluster Recommended that File Share Witness be configured in a third site © Copyright 2010 EMC Corporation. All rights reserved. 6
Cluster Enabler for Microsoft Failover Cluster LAN/WAN Private Interconnect File Share Witness with Recover. Point/CE installed Recover. Point Site A Cluster nodes with Recover. Point/CE installed Site B Failover cluster supports up to 8 nodes with Windows Server 2003/2008 using Majority Node Set with and without File Share Witness © Copyright 2010 EMC Corporation. All rights reserved. 7
Cluster Enabler for Microsoft Failover Cluster Node failure Role of Major Software Components Microsoft failover cluster software – Protects against server hardware or network connection failures – Initiates failover actions to a clustered node for resource group restart Cluster Enabler 4. 0 software – Installed on all cluster nodes and on File Share Witness (if File Share Witness is used) – Responds to queries from the cluster service that determine cluster behavior – Determines Recover. Point state and initiates appropriate Recover. Point actions using the Recover. Point API Recover. Point software – CRR provides remote mirroring of production data – CRR journal retained, allowing for point-in-time recovery outside of cluster operations © Copyright 2010 EMC Corporation. All rights reserved. 8
Cluster Enabler and Node Failure Event Failover steps Site A node fails, resulting in heartbeat response timeout Cluster reforms between the Site B node and the File Share Witness node The Site B node brings resource groups from the Site A node online The latest image of the Recover. Point volumes listed in the resource group are automatically recovered, read/write enabled, and mounted to the Site B node Majority Node Set with File Share Witness Application listed as part of the failed Site A node resource group is restarted The Site A node network address is added to the network interface of the Site B node and client traffic is routed to the Site B node Recover. Point Site A © Copyright 2010 EMC Corporation. All rights reserved. Site B 9
Disaster Recovery for Hyper-V Automated failover operations for Hyper-V environments New LAN/WAN Private Interconnect Majority Node Set with File Share Witness Prod 1 Target 2 Site A Target 1 Recover. Point Cluster nodes with Recover. Point/CE installed Prod 2 Site B Hyper-V with Failover Clusters supports up to 8 nodes with Windows 2008 R 2 © Copyright 2010 EMC Corporation. All rights reserved. 10
Hyper-V Overview Cluster Enabler 4. 0 supports Hyper-V with failover clusters New Failover of the virtual machine (VM) resource – Recover. Point/CE is deployed in the Hyper-V parent partition – Cluster relocation is at the VM level Hyper-V Live Migration and Quick Migration—between nodes at the same or different sites – Live Migration supported with Recover. Point CRR synchronous replication – Quick Migration supported with synchronous and asynchronous replication – Use for planned maintenance—such as VM relocation for hardware upgrades and software upgrades – Use for VM workload re-distribution—move VMs from one physical host to another © Copyright 2010 EMC Corporation. All rights reserved. 11
Hyper-V Virtual Machine Failure Event Failover steps with Cluster Enabler 4. 0 New Site A Hyper-V physical node fails, resulting in heartbeat response timeout Cluster reforms between the Site B node and the File Share Witness node The Site B node brings Hyper-V virtual machine resource groups from the Site A node online Recover. Point target volumes for consistency groups listed in affected resource groups are recovered and mounted to the Site B node Majority Node Set with File Share Witness Virtual machines listed as part of the failed Site A node resource group are restarted Recover. Point Site A The Site A node network address is added to the network interface of the Site B node and client traffic is routed to the Site B node Site B Virtual Machines can failover within and between failover cluster nodes © Copyright 2010 EMC Corporation. All rights reserved. 12
Hyper-V Live Migration New Planned hardware maintenance on physical server requires moving VM to another physical server Majority Node Set with File Share Witness R 1 R 2 Site A R 2 Recover. Point CRR synchronous replication R 1 Site B Live migration can be within the same site or between sites © Copyright 2010 EMC Corporation. All rights reserved. 13
Multi-Array Support Recover. Point WAN Each named cluster group’s associated devices reside in a single Recover. Point consistency group of the same name File Share Witness with Recover. Point/CE installed Devices for Cluster Group 1 Devices for Cluster Group 2 © Copyright 2010 EMC Corporation. All rights reserved. Cluster nodes with Recover. Point/CE installed 14
Microsoft Failover Clusters Deployed with Oracle on Windows Network Oracle Majority Node Set with File Share Witness Target 1 Prod 1 Target 2 Recover. Point Prod 2 Failover clusters configured with Oracle Fail Safe © Copyright 2010 EMC Corporation. All rights reserved. 15
Benefits of Cluster Enabler Provides rapid site restart with Recover. Point – Automatic site failover for common disruptions—including compete site disasters and server, storage, or networkrelated failures Minimizes site failback time with Recover. Point – Only changes are copied by Recover. Point or Recover. Point/SE to resynchronize the primary cluster storage system Provides multi-array support – One cluster can span multiple storage arrays at the same or different sites – Different clusters can share storage arrays Supports heterogeneous storage arrays – A mix of arrays can be used – Storage arrays do not have to be identical between sites © Copyright 2010 EMC Corporation. All rights reserved. 16
- Slides: 17