Modern Distributed Systems Design Security and High Availability
Modern Distributed Systems Design – Security and High Availability 1. Measuring Availability 2. Highly Available Data Management 3. Redundant System Design
Measuring Availability • How resiliency and high availability are interconnected? • Define downtime and what causing downtime. • How to meager availability?
Measuring Availability
Define Downtime • Downtime could be defined by following: “If a user cannot get his job done on time, the system is down”
What causing downtime? • Planned – ones that easiest to reduce that include scheduled system maintenance, hot-swappable hard drives, cluster upgrades and even failovers. Usually 30% of all downtime; • People or human factor – dumb mistakes and complex innovation in IT equipment, software and protocols requires greater knowledge of engineers. Usually 15 % of all downtime; • Software Failures - due to software bugs and viruses. (40%)
How to meager availability? MTBF Availability = -----------, where MTBF + MTTR MTBF – “mean time between failures” and MTTR - “maximum time to repair”
What can go wrong? • • • Hardware Environmental and Physical Failures Network Failures Database System Failures Web Server Failures File and Print Server Failures
The Cost of Downtime.
Levels of Availability: 1. Regular Availability 2. Increased Availability 3. High Availability 4. Disaster recovery 5. Fault-Tolerant System
Highly Available Data Management • Data management is the most sensitive area of modern distributed systems. • Quick overview of existing data topologies
Redundant System Design • Redundant storage (RAID, Multi-hosting, Multi-Pathing, Disk. Array, JBOD, etc) • Failover Configurations and Management • Introduction to SAN and Fibre Channel protocol • Security aspects of data management in Storage Area Networks
Redundant storage
Redundant Storage (RAID 5)
Failover Configurations and Management Failover must meet following requirements: • Transparent to client; • Quick (no more then 5 min, ideally 0 -2 min); • Minimal manual intervention, guaranteed data access.
Failover components: • Two servers, one primary another takeover; • Two network connections, third is highly recommended • All disks on a failover pair should have some sort of redundancy • Application portability • No single point of failure.
Symmetric Failover
Asymmetric Failover
Fibre Channel, SAN, IP Storage
Security in IP Storage Networks • Security in Fibre Channel SANs • Security Options for IP Storage Networks
Fibre Channel SAN Security • Port or hard zoning • WWN Zoning • LUN Masking
Security Options for IP Storage Networks • i. SNS • LUN Masking as in Fibre Channel and VLAN tagging • IP Security or IPSec • ACL
- Slides: 21