Disaster Recovery and Business Continuity Planning in a
Disaster Recovery and Business Continuity Planning in a University Environment Mardecia Bell Ann Harris
The realization of a single point of failure with one data center for both the central academic and administrative IT environments, prompted NC State University to implement a disaster recovery strategy for communications and critical applications residing on the mainframe & open systems computing environment.
History/Timeline 1997 Initiated with the administrative environment Mainframe environment recovery test 1999 Y 2 K - Business Continuity concept Acquired central repository software (LDRPS) 2001 Scheduled annual Mainframe recovery test Included communications & academic environment 2002 Expanded to include Enterprise Business Continuity/Disaster Recovery Planning 2004 Successful DR test of ERP systems 2005 Co-processing of production services began in Data Center II
Implementation Steps • • Gain Sponsorship Establish Steering Committees Develop University Policy/Regulation Create DR Structure/Establish Staffing Market Program Establish Central Repository Review & Test Plans Regularly
Gain Sponsorship • Office of the President – University System • Chancellor • Executive Management – – Present your Business Case Identify the roles involved Provide Executive Summary of BC/DR Program Present Statement of Work and Project Plan • Add responsibilities to staff work plans
Establish Steering Committees • IT Steering Committee • Business/Service Steering Committee • Both committees are comprised of – Vice Chancellor/Vice Provost Level – Representatives from Critical Areas of the Campus – Ex Officio members from IT areas • Mission of IT Steering Committee – Provide guidance and oversight for the combined academic and administrative Disaster Recovery Plan.
Policy/Regulations/Rule • Develop a Policy or Regulation to affirm the mandate and promote cooperation
Divide Campus Into Groupings • • • Space/Facilities Teaching and Academic Programs Academic IT Administrative IT Environmental Health and Public Safety Business Administration Research Programs Student Affairs Extension and Engagement
Resource Projections • Hire Full-Time Business Continuity and Disaster Recovery Personnel – Director of Business Continuity (plus 1 Business Analyst) – Admin IT DR Coordinator (plus 1 Business Analyst) – Academic DR Coordinator (part-time) • Add BC/DR responsibilities to work plan of existing staff • Identify Coordinators for each business unit
Marketing • • Present at campus departmental meetings Create a Website Utilize listserves Campus Newspaper Network with peer institutions Remain abreast of industry standards Attend conferences, workshops and seminars
Establish Central Information Repository Continuous Implementation
Accomplishments • Disaster Recovery and Business Continuity Plan • Risk Assessments for Critical Business Units • Successful Mainframe Recovery Tests • Designed and implemented infrastructure for central computing environment (academic & administrative) in secondary data center. • Implementation of recovery strategies in secondary data center • Creation of Administrative IT Disaster Recovery Unit
Illustration of Various DR Deployments q Fault-tolerant cluster (file and print services) A Production B Configuration A Configuration B Production A Production q Co-processing and load-balancing (ERP) A Production q Distributed deployment (hosted systems) A Development A Production q Data replication (mainframe) Server Data
Enterprise Resource Planning (ERP) Deployment q Financial System q Human Resources (Version 8. 8) q Student Information System (under construction) Campus Users DC II Batch Server Data Storage Area Network Web Server Application Server DB Server Batch Server
Summary and Future Steps DC I Novell Directory Services / Novell Email/Calendar Anti-SPAM File/Print, User Home Citrix Backup/vaulting Hosted systems Data Active Directory / Windows Storage Area Network Infrastructure Database Server Web Server ERP Web DC II ERP Application Development Server ERP DB Server Mainframe Server ERP Batch Backup/vaulting Hosted systems Data Active Directory / Windows Storage Area Network Infrastructure Database Server Web Server ERP Web ERP Application Development Server ERP DB Server Mainframe Server ERP Batch
Administrative IT Disaster Recovery Unit Mission • Ensure minimal risk of major disruptions to critical University systems and processes in the event that all or part of its computer operations are rendered inoperable. • Ensure timely recovery of infrastructure and services in the event of a disruption. • Ensure that business continuity plans are available and viable relative to its scenario.
Risk Management • • • Identify Mitigate Process Mapping
Risk Management Risk Mitigation Risk Assessment • Prioritize Actions • Evaluate recommended Control Options • Conduct Cost-Benefit Analysis • Select Controls • Assign Responsibility • Develop Safeguard Implementation Plan • Implement Selected Controls • • • NIST SP 800 -30 System Characterization Threat Identification Vulnerability Identification Control Analysis Likelihood Determination Impact Analysis Risk Determination Control Recommendations Results Documentation
Process Mapping
Infrastructure • Total DR through distributed high availability • Client Recovery Solutions • Application Restoration • Establish collaborative partnerships with other Universities
Client Recovery Solution(s)
Application Restoration • Event • Time • Scope of Impact – Infrastructure – Software – Hardware
Collaborative Partnerships
Vaulting • • Readily accessible Secure Onsite Offsite
Critical Business Units • • • • • • Advancement Services All Campus Network Budget Office College of Agriculture and Life Sciences - Personnel Office Com. Tech - Data Networking Com. Tech - Telecommunications Contracts and Grants Controller's Office Enterprise Application and Database Services EH&S - Business Continuity EH&S - Campus Police EH&S - Emergency Response EH&S - Environmental Affairs EH&S - Health and Safety EH&S - Industrial Hygiene EH&S - Insurance and Risk Management EH&S - Radiation Safety EH&S - Transportation EH&S - Waste Management Enrollment Management - Admissions Enrollment Management - Office of Scholarships & Financial Aid Enrollment Management - Registration and Records • • • Enterprise Technology Services and Support Facilities - Construction Management Facilities - Design and Construction Services Facilities - Operations Facilities - University Architect • • • • • Fire Protection Foundations Accounting & Investments HR - Benefits HR - Employment & Compensation HR - Human Resource Information Management HR - Payroll ITD - Business Services ITD - Computer Operations ITD - Computer Services ITD - Systems Libraries - Administration Materials Management - Materials Support Materials Management - Purchasing Materials Management - University Graphics Real Estate Student Health Services University Cashier's Office University Dining University Housing
Business Continuity Planning
Communication • • Consistency in plan updating Training Partnering Emergency Communication standardization – – – Call Trees Mobile Devices Website Incident Command System Call Center Incident Report Plan
IT Disaster Categorization • Category 1: A single person or group in a Critical Business Unit (CBU) is unable to perform their critical functions • Category 2: An entire CBU is unable to perform its critical functions • Category 3: Multiple CBUs are unable to perform their critical functions • Category 4: Non CBUs are not able to perform their critical functions • Category 5: A wide spread event that impacts the entire University
Goals • Total DR through distributed high availability • Standardized Emergency Communications • Immediate Client Recovery Solutions • Improved RTO
Ann Harris Asst Dir, Administrative IT Disaster Recovery 919 -515 -9228 ann_harris@ncsu. edu http: //www. fis. ncsu. edu/dr
- Slides: 31