- Slides: 19
High Availability for Information Security Managing The Seven R’s Rich Schiesser Sr. Technical Planner
The Seven R’s 1. Redundancy 2. Reputation 3. Reliability 4. Repairability 5. Recoverability 6. Responsiveness 7. Robustness
1. Redundancy – Eliminating Single Points of Failure – Components • • power supplies central processors memory segments disk storage – Servers • warm standby • hot standby – Networks • duplicate lines – Alternate Data Centers
Real Life Experience Duplicating Classified Components • Classified environment of N G defense contractor presented unique challenges. • Secured network links needed to be duplicated. • Encryption devices were required to be redundant. • Personnel with encryption keys had to be kept to a minimum.
2. Reputation • Credibility of Track Record of Key Suppliers of Data Center Hardware And Software • Methods to Verify Track Record - Market Share - Industry Analysts - Customer References
Real Life Experience The Good, the Bad, the Unbelievable • The Good – EMC’s disk array hardware • The Bad – EMC’s marketing tactics • The Unbelievable – ET Phone Home!
3. Reliability - Frequency of Outages - Common Measurement is the Mean Time Between Failure (MTBF) - acquired from manufacturers - verified with customers - compared to industry analysts’ reports - collected analyzed empirically - Methods to Collect and Analyze Data - trouble calls from clients problem tickets from suppliers feedback from client support personnel - feedback from supplier repair personnel
Real Life Experience Enterprise Security and Reliability • 20 th Century Fox Motion Pictures entered lucrative home entertainment business in 1995. • IBM AS/400 computers provided security and high availability for the highly critical applications. • The only significant outage occurred when a power transformer exploded.
4. Repairability • Duration of Outages • Common Measurement is the Mean • Time To Repair (MTTR) • Other Factors to Consider – root cause analysis – repeatability of causes – incorrect diagnosis – use of rolling averages – analysis of trends over time
Real Life Experience Bugged by Telephone Companies • Critical network link between two key divisions of an aerospace company kept failing intermittently. • Problem was not solved until all seven hardware and software vendors were brought in together to brainstorm solutions. • Analysis of data that showed patterns and trends finally solved the problem.
5. Recoverability • Degree of Fault Tolerance • Functional Operations – single and double-bit memory errors – disk and tape read/write retries – network transmission retries • Hardware and Software Components – operating systems – servers, disks drives and tape drives – network lines and equipment • Data Center Facility – – power systems air conditioning systems fire suppression computer rooms
Real Life Experience Accidental Testing in Production • A marketing representative from a major server manufacturer got more than he bargained for while demonstrating his product’s failover capability. • Fortunately for him and his company, the product performed as advertised.
6. Responsiveness • Urgency of Support • Manual Response – help desk resolution – dispatching to client support groups – escalation to suppliers or specialists • Automated Response – self-detection and correction of errors – remote monitoring and circumvention of failing equipment – automated dispatching of service personnel
Real Life Experience IBM Supplies Air Support • A major aerospace firm invested heavily in a critical IBM database system that began having software security problems. • The DBA and IBM managers escalated to the highest levels of their respective companies. • The vendor used a unique method to ensure its technical specialists arrived onsite on time.
7. Robustness • Overall Quality of the System • Able to Withstand a Variety of Disruptive Forces: – internal and external to the company – natural and man-made disasters • Places a High Premium on: – documentation – training – analysis – continuous improvement
Real Life Experience Politically Charged Security Decisions • California recently passed a law requiring, in some instances, disclosure of customer data to all residents of the state. • A mortgage company recently encountered theft of some desktop computers one month prior to enactment of the law. • The company stepped up efforts to train employees on the impact of this new law, and methods to mitigate its effects.
Summary 1. Redundancy Elimination of Single Points of Failure 2. Reputation Credibility of Track Record 3. Reliability Frequency of Outages 4. Repairability Duration of Outages 5. Recoverability Degree of Fault Tolerance 6. Responsiveness Urgency of Support 7. Robustness Overall Quality of the System
Thank You for Your Participation