BUSINESS CONTINUITY AND DISASTER RECOVERY STRATEGIES Giulio Brenna
BUSINESS CONTINUITY AND DISASTER RECOVERY STRATEGIES Giulio Brenna Systems Engineers manager Specialists Team © Copyright 2013 EMC Corporation. All rights reserved. 1
Workshop Objectives Explain Why Customers need a BC/DR Strategy Explain Capabilities, Complexity, and Choice Understand BC and DR from a technological standpoint Describe the main EMC Solutions for BC/DR © Copyright 2013 EMC Corporation. All rights reserved. 2
BC & DR Drivers © Copyright 2013 EMC Corporation. All rights reserved. 3
Why You Should Care A study from research firm Frost & Sullivan estimates that North American Business Continuity and Disaster Recovery spending will reach $23. 3 billion by 2012. That is up more than 50 percent from $15. 1 billion in 2006. "We are seeing increased concern from small and mid-sized enterprises about how they protect their data, ” October 2009 © Copyright 2013 EMC Corporation. All rights reserved. 4
Business Continuity and Disaster Recovery Decision Drivers Business Considerations Technical Considerations Cost Consistency and Recovery Functionality, Availability Capacity Recovery-Time Objectives Bandwidth Recovery-Point Objectives Performance PRIMARY DECISION DRIVERS © Copyright 2013 EMC Corporation. All rights reserved. 5
The Cost of Downtime Per Hour By Industry Investments Manufacturing Telecom Banking Transportation Retail Insurance $0 $100, 000 $200, 000 $300, 000 $400, 000 Source: AMR Research © Copyright 2013 EMC Corporation. All rights reserved. 6
The Impact of Business Continuity Productivity Impact Revenue Impact • Employees affected • Email ! • Systems • Direct + Indirect losses • Compensatory payments • Lost future revenue Brand Impact Financial Impact • • Revenue recognition • Cash flow Customers Suppliers Financial markets Banks Business partners The Media © Copyright 2013 EMC Corporation. All rights reserved. 7
Business Continuity Considerations • What are your company’s most critical processes and data needs? • How much data can you afford to lose? • How quickly do you need to restore your critical processes? • How vulnerable are your operations to disasters? © Copyright 2013 EMC Corporation. All rights reserved. 8
Events that Impact Information Availability Events that require a data center move: <1% of occurrences Examples? Unscheduled events/failures: 15% Examples? Scheduled events/competing workloads: 85% Examples? © Copyright 2013 EMC Corporation. All rights reserved. 9
Events that Impact Information Availability Events that require a data center move: <1% of occurrences Disaster events – Fire, flood, storms, etc. Data center move or relocation Workload relocation Unscheduled events/failures: 15% Server failure Application failure Network / storage failure Processing or operator error Scheduled events/competing workloads: 85% © Copyright 2013 EMC Corporation. All rights reserved. Maintenance and migrations Backup and restore Batch processing Reporting Data warehouse extract, build, and load 10
Terminology © Copyright 2013 EMC Corporation. All rights reserved. 11
A Key Differentiation Understand the difference between Disaster Recovery (DR) and Business Continuity (BC) • Disaster Recovery: Restoring IT operations following a site failure • Business Continuity: Reducing or eliminating application downtime • Disaster Avoidance: availability to predict and take actions to avoid Disaster Impact © Copyright 2013 EMC Corporation. All rights reserved. 12
The Evolution of Availability Continuous Availability Application continues with no disruption (Zero Downtime) High Availability In-data-center Application Restart Convergence Advanced Recovery Replication to Second Site Traditional Disaster Recovery Tape Backup and Offsite Rotation © Copyright 2013 EMC Corporation. All rights reserved. 13
Protecting Information is a Business Decision Recovery-point objective (RPO): How much data can you afford to lose, can you determine a sync point Recovery-time objective (RTO): How long can you afford to idle your business and survive? Fast recovery times enable continuous business operations Slow recovery times—or data loss—translates into Business Recover Plan and procedures Technology Failover Data Lost Recovery Point Objective © Copyright 2013 EMC Corporation. All rights reserved. Notification Plan Activation System Restart Service Availability Recovery Time Objective 14
Balancing Business Requirements and Cost Auto. Replication mation Backup Cost of Data Availabilit y Cost of Data Loss $$ Replication Backup Cost of System Availabilit y Cost of System Downtim e Critical Application Business Application RPO The point in time to which critical data must be restored to following an interruption before its loss severely impacts the organization © Copyright 2013 EMC Corporation. All rights reserved. Time =0 RTO The maximum acceptable length of time that can elapse following an interruption to the operations of a business function before its absence severely impacts the organization 15
Basic Replica Tipologies Active – Active access NEW Replica syncronous replication with zero data loss Synchrono us Mirroring replication with different levels of possible data loss Remote replication of data based on time intervals or events Stand-By Database Remote Journalin g Electroni c Vaulting Tradition al Backup Semi. Synchronou s Mirroring Costs Classic Backup with physical tape transportation Remote Replication Transactions Acces Anywhere © Copyright 2013 EMC Corporation. All rights reserved. Zero data loss ss m 2 h 4 h 8 h 12 h 24 h 48 h Recovery Point Objective 16
Replica protocols © Copyright 2013 EMC Corporation. All rights reserved. 17
Latency in dark fiber is ~ 5 ns/m or 5 us/km (One 10 km link can have 50 us latency) Worst …. . A round‐trip time (RTT) can be 100 us Latency over SONET/SDH is higher Latency over IP networks is generally much higher Latency directly impacts application performance: – – – Increased idle‐time while application is waiting for read data Increased idle‐time while application is waiting for write acknowledgement Reduces I/Os per second (IOPS) © Copyright 2013 EMC Corporation. All rights reserved. 18
Protection% Active – Active access NEW Replica syncronous replication with zero data loss Synchrono us Mirroring replication with different levels of possible data loss Remote replication of data based on time intervals or events Stand-By Database Remote Journalin g Electroni c Vaulting Tradition al Backup Semi. Synchronou s Mirroring Costs Classic Backup with physical tape transportation Remote Replication Transactions Acces Anywhere © Copyright 2013 EMC Corporation. All rights reserved. Zero data loss ss m 2 h 4 h 8 h 12 h 24 h 48 h Recovery Point Objective 19
Protection% Active – Active access NEW Replica syncronous replication with zero data loss Synchrono us Mirroring replication with different levels of possible data loss Remote replication of data based on time intervals or events Stand-By Database Remote Journalin g Electroni c Vaulting Tradition al Backup Semi. Synchronou s Mirroring Costs Classic Backup with physical tape transportation Remote Replication Transactions Acces Anywhere © Copyright 2013 EMC Corporation. All rights reserved. Zero data loss ss m 2 h 4 h 8 h 12 h 24 h 48 h Recovery Point Objective 20
Backup Evolution Over Time From Tape to Disk to Deduplication Traditional Tape Centric Backup Applications Onsite Backup Storage DR Storage Backup Software Tape Backup Software VTL/Tape Backup Failures Recovery Time Storage Cost Complexity Decrease Transformational Disk Centric Deduplication Backup Software and System © Copyright 2013 EMC Corporation. All rights reserved. 21
Deduplication is Accelerating the Transition More Efficient Reduced Storage Less Bandwidth © Copyright 2013 EMC Corporation. All rights reserved. 22
EMC Backup and Recovery Solutions Data Domain Avamar © Copyright 2013 EMC Corporation. All rights reserved. Net. Worker Disk Library Data Protection for Mainframe Advisor 23
Protection% Active – Active access NEW Replica syncronous replication with zero data loss Synchrono us Mirroring replication with different levels of possible data loss Remote replication of data based on time intervals or events Stand-By Database Remote Journalin g Electroni c Vaulting Tradition al Backup Semi. Synchronou s Mirroring Costs Classic Backup with physical tape transportation Remote Replication Transactions Acces Anywhere © Copyright 2013 EMC Corporation. All rights reserved. Zero data loss ss m 2 h 4 h 8 h 12 h 24 h 48 h Recovery Point Objective 24
Symmetrix Remote Data Facility (SRDF) Family Industry-leading remote replication SRDF Family SRDF/S Synchronous for zero data exposure SRDF/Star • Protects against local and regional disruptions SRDF/CE • Increases application availability by reducing downtime SRDF/AR • Minimizes/eliminates performance impact on applications and hosts SRDF/CG • Independent of hosts and operating systems, applications, and databases Multi-site replication option Cluster Enabler option SRDF/A Automated Replication option Asynchronous for extended distances Consistency Groups SRDF/DM Cascaded SRDF and SRDF/EDP Efficient Symmetrix-to -Symmetrix data mobility Extended Distance Protection Concurrent SRDF Concurrent EMC offers choice and flexibility to meet any service level requirement © Copyright 2013 EMC Corporation. All rights reserved. • Improves recovery point objectives (RPOs) and recovery time objectives (RTOs) with automated restart solutions • Mission-critical proven with numerous testimonials and references • Tens of thousands of licenses shipped 25
SRDF Synch Virtual machines Production Virtual R 1 Data Time. Finder R 2 Boot Time. Finder Primary Secondary SRDF links Test and development Provides disaster restart of remotely replicated devices and can be used for offsite backup operations using EMC Time. Finder © Copyright 2013 EMC Corporation. All rights reserved. 26
SRDF/A Delta Set Push Operation Source 1 2 N Capture N– 1 Transmit Target 3 4 WAN 3 4 N– 1 Receive N– 2 Apply R 2 SRDF/A write I/O cycle number assigned as part of capture cycle (N) SRDF/A write I/O acknowledged back to host as local write operation SRDF/A write I/O cycle number is part of transmit/receive cycle (N– 1) SRDF/A write I/O acknowledged from target and removed from transmit cycle (N– 1) on source Capture to transmit cycle switch initiated based on cycle switch time interval setting with N– 1 and N– 2 cycles completed © Copyright 2013 EMC Corporation. All rights reserved. 27
SRDF Advanced Three-Site SRDF/Star Solution Site A Workload Site R 11 Reconfigure dynamic SRDF devices at Site A to Concurrent SRDF mode and start SRDF/A session from Site A to C SRDF/A SRDF/S SRDF/A R 2 Extended intersite link outage occurs between Sites B and C Site B Local or Regional Site © Copyright 2013 EMC Corporation. All rights reserved. Site C Out-of-Region Site 28
EMC RECOVERPOINT FAMILY One way to protect everything better © Copyright 2013 EMC Corporation. All rights reserved. 29
What Is EMC Recover. Point? One way to protect everything better Recover. Point family CDP: local protection CRR: remote protection CLR: concurrent local and remote protection • Protects any physical or virtual host, application, or storage • Provides affordable data protection • Uses a DVR-like point-in-time recovery • Supports policy-based synchronous and asynchronous replication • Supports Block and NAS © Copyright 2013 EMC Corporation. All rights reserved. 30
Any Point-in-Time Recovery Recover. Point for continuous protection Daily Backup: Recovery point is once every 24 hours Snapshots: Recovery point is once every 8 hours Disk Mirroring: Recovery point is latest image replicated • Recover to any point in time • Annotate selected recovery points with bookmarks • Continue replication during recovery • Use recovered image for a variety of purposes Continuous Protection: Recovery to any point in time UNLIMITED RECOVERY POINTS, APPLICATION BOOKMARKS Time Check- Pre- Patch Post- Cache Hot Checkpoint Patch Flush Backup point © Copyright 2013 EMC Corporation. All rights reserved. 31
Protection% Active – Active access NEW Replica syncronous replication with zero data loss Synchrono us Mirroring replication with different levels of possible data loss Remote replication of data based on time intervals or events Stand-By Database Remote Journalin g Electroni c Vaulting Tradition al Backup Semi. Synchronou s Mirroring Costs Classic Backup with physical tape transportation Remote Replication Transactions Acces Anywhere © Copyright 2013 EMC Corporation. All rights reserved. Zero data loss ss m 2 h 4 h 8 h 12 h 24 h 48 h Recovery Point Objective 32
Mobility. Availability. Collaboration. © Copyright 2013 EMC Corporation. All rights reserved. 33
The Unique Value of VPLEX Access Anywhere Access the SAME information… in SEPARATE locations all at the SAME time… … © Copyright 2013 EMC Corporation. All rights reserved. 34
Active-Passive Data Access Before VPLEX Site A Site B Active-Passive Site Data on disaster recovery site is used on failure Outage to move applications SYNCHRONOUS/ASYNCHRONOUS REPLICATION © Copyright 2013 EMC Corporation. All rights reserved. 35
Federated Data Access With VPLEX Site A Site B Active-Active Site VPLEX Metro or VPLEX Geo DISTRIBUTED VIRTUAL VOLUME TRANSFER PROTOCOL VPLEX enables active use of resources at two sites © Copyright 2013 EMC Corporation. All rights reserved. 36
The VPLEX Family of Products Local Within a data center © Copyright 2013 EMC Corporation. All rights reserved. Metro Access. Anywhe re at synchronous distances Geo NEW Access. Anywher e at asynchronous distances 37
Application Consistency Which type of consistency will be required for the applications that you’re going to protect ? Crash Consistency This is the equivalent of pulling the power from a server while the applications are running, and then powering up the server again. Replication solutions that have limited knowledge of the applications are easier to put together. During recovery you are reliant on the application’s capability to start up on its own merits, or possibly with some intervention. Following a fail-over, the data will not have transactional consistence, if transactions were in-flight at the time of the failure. In most cases what occurs is that once the application or database is restarted, the incomplete transactions are identified and the updates relating to these transactions are “backed-out” or some extra procedures or tools may be required. Application Consistency There are ways of ensuring that if a copy is taken, or if a system is shut down, all necessary transactions within a database are complete and caches are flushed inorder to maintain consistency. Scripts can be written, following best practice for each application to ensure processes take place in a certain order, or there applications which can automate these procedures for each application. Some technologies use agents which are application specific. The choice is again down to importance of data, RPOs, RTO’s and the available budgets within the organisation. © Copyright 2013 EMC Corporation. All rights reserved. 38
- Slides: 39