Business Continuity Disaster Recovery Business Impact Analysis RPORTO
Business Continuity & Disaster Recovery Business Impact Analysis RPO/RTO Testing, Backups, Audit Based on CISA Review Manual 2009
Acknowledgments Material is from: n CISA Review Manual, 2009 Author: Susan J Lincke, Ph. D Univ. of Wisconsin-Parkside Reviewers: Funded by National Science Foundation (NSF) Course, Curriculum and Laboratory Improvement (CCLI) grant 0837574: Information Security: Audit, Case Study, and Service Learning. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and/or source(s) and do not necessarily reflect the views of the National Science Foundation.
Imagine a company… Bank with 1 Million accounts, social security numbers, credit cards, loans… n Airline serving 50, 000 people on 250 flights daily… n Pharmacy system filling 5 million prescriptions per year, some of the prescriptions are life-saving… n Factory with 200 employees producing 200, 000 products per day using robots… n
Imagine a system failure… Server failure n Disk System failure n Hacker break-in n Denial of Service attack n Extended power failure n Snow storm n Spyware n Malevolent virus or worm n Earthquake, tornado n Employee error or revenge How will this affect each business? n
First Step: Business Impact Analysis Which business processes are of strategic importance? n What disasters could occur? n What impact would they have on the organization financially? Legally? On human life? On reputation? n What is the required recovery time period? Answers obtained via questionnaire, interviews, or meeting with key users of IT n
Event Damage Classification Negligible: No significant cost or damage Minor: A non-negligible event with no material or financial impact on the business Major: Impacts one or more departments and may impact outside clients Crisis: Has a major material or financial impact on the business Minor, Major, & Crisis events should be documented and tracked to repair
An Incident Occurs… Call Security Officer (SO) Security officer declares disaster SO follows pre-established protocol Emergency Response Team: Human life: First concern Phone tree notifies relevant participants Public relations interfaces with media (everyone else quiet) Mgmt, legal council act IT follows Disaster Recovery Plan
Recovery Time: Terms Interruption Window: Time duration organization can wait between point of failure and service resumption Service Delivery Objective (SDO): Level of service in Alternate Mode Maximum Tolerable Outage: Max time in Alternate Mode Disaster Recovery Plan Implemented Regular Service SDO Alternate Mode Time… Interruption Window Maximum Tolerable Outage Regular Service Restoration Plan Implemented
Definitions Business Continuity: Offer critical services in event of disruption Disaster Recovery: Survive interruption to computer information systems Alternate Process Mode: Service offered by backup system Disaster Recovery Plan: How to transition to Alternate Process Mode Restoration Plan: How to return to regular system mode
Business Continuity Process n n n n Perform Business Impact Analysis Prioritize services to support critical business processes Determine alternate processing modes for critical and vital services Develop the Disaster Recovery plan for IS systems recovery Develop BCP for business operations recovery and continuation Test the plans Maintain plans
Classification of Services Critical $$$$: Cannot be performed manually. Tolerance to interruption is very low Vital $$: Can be performed manually for very short time Sensitive $: Can be performed manually for a period of time, but may cost more in staff Nonsensitive ¢: Can be performed manually for an extended period of time with little additional cost and minimal recovery effort
Recovery Point Objective One Week One Day One Hour How far back can you fail to? One week’s worth of data? Interruption RPO and RTO Recovery Time Objective 1 2 Hours 24 Hours How long can you operate without a system? Which services can last how long?
Recovery Point Objective Backup Images Mirroring: RAID Orphan Data: Data which is lost and never recovered. RPO influences the Backup Period
Disruption vs. Recovery Costs Service Downtime Cost * Hot Site * Warm Site Alternative Recovery Strategies Minimum Cost Time * Cold Site
https: //awstrainingindelhi. wordpress. com/2013/12/28/leverage-benefits-of-clouddr-with-lower-rto-rpo-of-you-business-critical-applications/
https: //www. linkedin. com/pulse/testing-active-solution-disaster-recovery-luca-mambella
Bare-metal restore (������ restore ������������������ ( Bare-metal restore is a technique in the field of data recovery and restoration where the backed up data is available in a form which allows one to restore a computer system from "bare metal", i. e. without any requirements as to previously installed software or operating system. https: //en. wikipedia. org/wiki/Bare-metal_restore https: //www. linkedin. com/pulse/difference-between-rto-rpo-mohammed-pmp-itil-expert-prince 2
https: //secure. n-able. com/webhelp/NC_9 -1 -0_SO_en/Content/Ncentral/Backup. Manager_Overview. html
Alternative Recovery Strategies Hot Site: Fully configured, ready to operate within hours Warm Site: Ready to operate within days: no or low power main computer. Does contain disks, network, peripherals. Cold Site: Ready to operate within weeks. Contains electrical wiring, air conditioning, flooring Duplicate or Redundant Info. Processing Facility: Standby hot site within the organization Reciprocal Agreement with another organization or division Mobile Site: Fully- or partially-configured trailer comes to your site, with microwave or satellite communications
Hot Site n n Contractual costs include: basic subscription, monthly fee, testing charges, activation costs, and hourly/daily use charges Contractual issues include: other subscriber access, speed of access, configurations, staff assistance, audit & test Hot site is for emergency use – not long term May offer warm or cold site for extended durations
Reciprocal Agreements Advantage: Low cost Problems may include: ¨ Quick access ¨ Compatibility (computer, software, …) ¨ Resource availability: computer, network, staff ¨ Priority of visitor ¨ Security (less a problem if same organization) ¨ Testing required ¨ Susceptibility to same disasters ¨ Length of welcomed stay
Concerns for a BCP/DR Plan n n Evacuation plan: People’s lives always take first priority Disaster declaration: Who, how, for what? Responsibility: Who covers necessary disaster recovery functions Procedures for Disaster Recovery Procedures for Alternate Mode operation ¨ Resource operation Allocation: During recovery & continued Copies of the plan should be off-site
Disaster Recovery Responsibilities General Business n First responder: Evacuation, fire, health… n Damage Assessment n Emergency Mgmt n Legal Affairs n Transportation/Relocation /Coordination (people, equipment) n Supplies n Salvage n Training IT-Specific Functions n Software n Application n Emergency operations n Network recovery n Hardware n Database/Data Entry n Information Security
BCP Documents Focus: Event Recovery IT Disaster Recovery Plan Business Recovery Plan Procedures to recover at alternate site Recover business after a disaster IT Contingency Plan: Occupant Emergency Plan: Recovers major application or system Protect life and assets during physical threat Cyber Incident Response Plan: Crisis Communication Plan: Malicious cyber incident Business Continuity Business Provide status reports to public and personnel Business Continuity Plan Continuity of Operations Plan Longer duration outages
Network Disaster Recovery Last-mile circuit protection E. g. , Local: microwave & cable Alternative Routing Redundancy Includes: Routing protocols Fail-over Multiple paths >1 Medium or > 1 network provider Long-haul network diversity Redundant network providers Diverse Routing Multiple paths, 1 medium type Voice Recovery Voice communication backup
RAID – Data Mirroring AB CD ABCD RAID 0: Striping ABCD RAID 1: Mirroring AB CD Parity Higher Level RAID: Striping & Redundancy Redundant Array of Independent Disks
Disaster Recovery Test Execution Always tested in this order: Desk-Based Evaluation/Paper Test: A group steps through a paper procedure and mentally performs each step. Preparedness Test: Part of the full test is performed. Different parts are tested regularly. Full Operational Test: Simulation of a full disaster
Backup & Offsite Library n n n Backups are kept off-site (1 or more) Off-site is sufficiently far away (disasterredundant) Library is equally secure as main site; unlabelled Library has constant environmental control (humidity-, temperature-controlled, UPS, smoke/water detectors, fire extinguishers) Detailed inventory of storage media & files is maintained
Backup Rotation: Grandfather/Father/Son Grandfather Dec ‘ 09 Jan ‘ 10 Feb ‘ 10 Mar ‘ 10 Apr ‘ 10 Father May 1 May 7 May 14 May 21 graduates Son May 22 May 23 May 24 May 25 May 26 May 27 May 28 Frequency of backup = daily, 3 generations
Incremental & Differential Backups Daily Events Full Differential Incremental Monday: Full Backup Monday Tuesday: A Changes Tuesday Saves A Wednesday: B Changes Wed’day Saves A + B Saves B Thursday: C Changes Thursday Saves A+B+C Saves C Friday: Full Backup Friday n n If a failure occurs on Thursday, what needs to be reloaded for Full, Differential, Incremental? Which methods take longer to backup? To reload?
Backup Labeling Data Set Name = Master Inventory Volume Serial # = 12. 1. 24. 10 Date Created = Jan 24, 2010 Accounting Period = 3 W-1 Q-2010 Offsite Storage Bin # = Jan 2010 Backup could be disk…
Insurance IPF & Equipment Business Interruption: Loss of profit due to IS interruption Data & Media Employee Damage Valuable Papers & Records: Covers cash Fidelity Coverage: value of lost/damaged paper & records Loss from dishonest employees Extra Expense: Media Reconstruction Errors & Omissions: Extra cost of operation following IPF damage Cost of reproduction of media Liability for error resulting in loss to client IS Equipment & Media Transportation Facilities: Loss of IPF & Loss of data during xport equipment due to damage IPF = Information Processing Facility
Auditing BCP Includes: n Is BIA complete with RPO/RTO defined for all services? n Is the BCP in-line with business goals, effective, and current? n Is it clear who does what in the BCP and DRP? n Is everyone trained, competent, and happy with their jobs? n Is the DRP detailed, maintained, and tested? n Is the BCP and DRP consistent in their recovery coverage? n Are people listed in the BCP/phone tree current and do they have a copy of BC manual? n Are the backup/recovery procedures being followed? n Does the hot site have correct copies of all software? n Is the backup site maintained to expectations, and are the expectations effective? n Was the DRP test documented well, and was the DRP updated?
Question The amount of data transactions that are allowed to be lost following a computer failure (i. e. , duration of orphan data) is the: 1. Recovery Time Objective 2. Recovery Point Objective 3. Service Delivery Objective 4. Maximum Tolerable Outage
Question 1. 2. 3. 4. The FIRST thing that should be done when you discover an intruder has hacked into your computer system is to: Disconnect the computer facilities from the computer network to hopefully disconnect the attacker Power down the server to prevent further loss of confidentiality and data integrity. Call the manager. Follow the directions of the Incident Response Plan.
Question When the RTO is large, this is associated with: 1. Critical applications 2. A speedy alternative recovery strategy 3. Sensitive or nonsensitive services 4. An extensive restoration plan
Question During an audit of the business continuity plan, the finding of MOST concern is: 1. The phone tree has not been doublechecked in 6 months 2. The Business Impact Analysis has not been updated this year 3. A test of the backup-recovery system is not performed regularly 4. The backup library site lacks a UPS
Question When the RPO is very short, the best solution is: 1. Cold site 2. Data mirroring 3. A detailed and efficient Disaster Recovery Plan 4. An accurate Business Continuity Plan
Question The first and most important BCP test is the: 1. Fully operational test 2. Preparedness test 3. Security test 4. Desk-based paper test
Question When a disaster occurs, the highest priority is: 1. Ensuring everyone is safe 2. Minimizing data loss by saving important data 3. Recovery of backup tapes 4. Calling a manager
Question A documented process where one determines the most crucial IT operations from the business perspective 1. Business Continuity Plan 2. Disaster Recovery Plan 3. Restoration Plan 4. Business Impact Analysis
Vocabulary n n n n n Service delivery objective, alternate mode, interruption window, maximum tolerable outage, restoration plan Recovery point objective, recovery time objective, orphan data Hot site, warm site, cold site, reciprocal agreement Diverse routing, alternative routing, last mile circuit protection, long haul network diversity Desk-based/Paper test, preparedness test, fully operational test Incremental vs. differential backup Events: negligible, minor, major, crises Service Classification: critical, vital, sensitive, nonsensitive Questions to consider in book page 827: all.
- Slides: 42