FaultTolerant General Purpose Servers Express 5800ft series servers

  • Slides: 13
Download presentation
Fault-Tolerant General Purpose Servers Express 5800/ft series servers Product Information

Fault-Tolerant General Purpose Servers Express 5800/ft series servers Product Information

Express 5800/ft Series Servers High Availability Technologies

Express 5800/ft Series Servers High Availability Technologies

Approaches to Reliability and Availability ▐ Select and combine hardware and software technologies for

Approaches to Reliability and Availability ▐ Select and combine hardware and software technologies for availability Higher availability of a single server FT server + cluster Redundant hardware (dual modular architecture) • Continuous operation despite of hardware failures. • Simplified installation and operation Fault tolerant server FT server cluster Enhance fault tolerance of the hardware Single server (Typical servers) Partially redundant hardware (e. g. HDD, PSU) Cluster software Enhance availability of the system Failover across multiple servers • Enhanced HW/SW failure resilience • For Large scale system with scalable nodes etc. Higher availability of the system Select the best availability solution according to system requirements Page 3 © NEC Corporation 2013

FT Server and Cluster Solution Comparison Fault tolerant server Cluster system Aim High availability

FT Server and Cluster Solution Comparison Fault tolerant server Cluster system Aim High availability of a single server Achieve availability / scalability / load balancing Technology Lockstep (CPU&MEM) and Failover (I/O) (Synchronized in normal conditions) Failover Load balancing Failure Failover process CPU Failover Isolation HDD Failure EXPRESS CLUSTER Memory HDD Service during failure CPU Isolate faulty component Failover to other servers Continuous operation (no interruption) Operation is interrupted for failover process (some several minutes to 10 minutes) Resilience Performance enhancement Hardware failures Hardware/ Software failures Add CPU or node. Supports servers with 4 or more sockets Supported apps     General applications    No modifications needed Failover settings is required for each app. (creation of script batch files) • System configuration requires no app modifications • Continuous operation without interruption • Ideal for 24 -7 systems, email and Web servers • Features load balancing as well as availability • Software failure-resilient • Suitable for large-scale systems (scalable nodes) ft servers provide hardware availability and can be installed quick and easily Ft servers + EXPRESSCLUSTER solution takes advantage of both solutions Page 4 © NEC Corporation 2013

Recovery Process from HW Failures Express 5800/ft series server Continuous operation Recovery complete Non-stop

Recovery Process from HW Failures Express 5800/ft series server Continuous operation Recovery complete Non-stop service Failure In service 1. Instantaneous isolation of the faulty module Module #0 In service 2. Resynchronization after replacement Isolated faulty model Processing Lockstep Processing Replacement of faulty module Processing Module #1 Cluster system Failure Start failover process System down for a few mins to 10 mins In service Service Intermittence 1. Interruption (a few secs) 2. Determine failover host (a few secs to 1 -2 mins) Failover Page 5 Failover complete 3. Takeover of cluster resources (e. g. NW settings and disks) (a few secs to 1 min) © NEC Corporation 2013 Restart service 4. Restart apps (a few secs to a few mins) Repair / Replace

Express 5800/ft Series Servers Optional Features to Increase Fault Tolerance

Express 5800/ft Series Servers Optional Features to Increase Fault Tolerance

Express Report Service Support • Isolate the failed components to continue operation. • Monitor

Express Report Service Support • Isolate the failed components to continue operation. • Monitor hardware status at the service center. • Support the system proactively to ensure continuous availability. Express Report Service ② ① Only the alert information will be sent out with dedicated software (secure environment) Isolation Failure CPU CPU Mem Mem HDD HDD Client Continuous Operation Recovery ④ CPU Mem HDD Hardware monitoring & detection ③ Via the internet (mail server) public line (modem connection) Alert Notification Replace NEC (monitoring center) CPU Mem HDD NEC Service Center Notification Page 7 © NEC Corporation 2013

Support for Redundant Peripheral Devices ▐ Selection of LTO or DAT and support for

Support for Redundant Peripheral Devices ▐ Selection of LTO or DAT and support for redundant backup* ◆ Double backup configuration is supported to provide for failures during backup ◆ LTO or DAT drives are offered for selection ft series Module #1 SAS Controller Backup device Module #2 u. Data is output from each module to achieve backup redundancy   u. Both backups are created almost simultaneously * Configuration of standalone backup is also supported ▐ A two UPS configuration provides tolerance against UPS defects* ft series Module #1 UPS PSU Uninterruptable power supply Module #2 Page 8 u. Connecting each UPS to separate power sources helps avoid being affected by failures of the power sources UPS © NEC Corporation 2013 * Single UPS configuration is also supported.  UPS is controlled through the network

ft series + EXPRESSCLUSTER for Higher Availability Enhancement SW ▐ Clusters with ft servers

ft series + EXPRESSCLUSTER for Higher Availability Enhancement SW ▐ Clusters with ft servers enhance both HW and SW availability Software failure EXPRESSCLUSTER Failover to secondary server EXPRESSCLUSTER monitors SW Apps OS OS Module #0 Module #1 ft server (secondary) Module #0 Module #1 Hardware failure ft server (primary) ft series server Highest level of availability suitable for critical systems Page 9 © NEC Corporation 2013

Benefits of ft Series + EXPRESSCLUSTER Enhancement SW ▐ Clusters using ft servers deliver

Benefits of ft Series + EXPRESSCLUSTER Enhancement SW ▐ Clusters using ft servers deliver the benefits of both solutions Function HW failure tolerance Treatment time SW failure tolerance Treatment time Periodical maintenance (SW update) Performance enhancement Apps settings Express 5800/ft server Cluster system (configured by normal servers) Cluster system (configured by ft servers) Lockstep and Failover (within a server) Failover (between multiple servers) ★★★ ★★☆ ★★★ Isolate faulty module (within the server) Failover from the primary server to the secondary server Isolate faulty module within the primary server (no failover between nodes) Few minutes (Depends on the time necessary to startup apps) Instantaneous - ★★☆ (Apps level failures can be resolved by Single. Server. Safe software)  Failover from the primary server to the secondary server - Several minutes (Depends on the time necessary to startup apps) ★★☆ ★★★ Each node can be separated for upgrade ★★☆ ★★★ ★★☆ Add CPU or Nodes Add CPU Active Upgrade enables OS patches to be applied with only short interruption ★★★ General apps can be used without special modifications ★☆☆ Takeover process is required for each app Legend: ★★★: Excellent, ★★☆: Good, ★ ☆ ☆ : Fair Page 10 © NEC Corporation 2013

ft server + Hyper V + EXPRESSCLUSTER Enhancement SW ▐ Clusters configured on Hyper-V

ft server + Hyper V + EXPRESSCLUSTER Enhancement SW ▐ Clusters configured on Hyper-V on an ft server Software failure EXPRESSCluster monitors SW In the event of a SW failure, the operation fails over to another guest OS Apps Guest OS Hyper-V™ 2. 0 Hardware failure ft server Module #0 Module #1 ft series server High HW and SW availability for virtualized environments Page 11 © NEC Corporation 2013

Express. Cluster X Single. Server. Safe Enhancement SW ▐ SW is monitored on the

Express. Cluster X Single. Server. Safe Enhancement SW ▐ SW is monitored on the ft server to automatically restart the SW in the event of a failure. ◆ Single. Server. Safe (SSS) monitors the server and SW status at all times.   ◆ In an event of a failure, SSS restarts the service, process, OS etc. to resume operation.   ◆ The ft server and SSS in tandem can handle both HW and SW failures Service Process Restart Apps Single. Server. Safe u. By enabling failure detection and restart/reboot, SSS helps handle a wide range of failures with a single server u. By using the optional monitoring function of EXPRESSCluster, SSS is capable of further detailed monitoring including the detection of stalling in data bases. OS Reboot SW availability can be improved even for a single ft server Page 12 © NEC Corporation 2013

Page 13 © NEC Corporation 2013

Page 13 © NEC Corporation 2013