Progress towards a GGUS failsafe system Steinbuch Centre
Progress towards a GGUS fail-safe system Steinbuch Centre for Computing (SCC) O. Dulov / oleg. dulov@kit. edu STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www. kit. edu
Agenda GGUS Service structure Service Availability and Service Value Availability and other ITIL processes Availability and Incident lifecycle Some High Availability (HA) topics Migration: GGUS to HA GGUS Objective Plan Phase 0 Phase 1 Phase 2 Conclusion 2 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
GGUS Service stucture GGUS – Global Grid User Support Service based on the Remedy Action Request System (ARS) 3 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Availability and Service Value Available Capacity Continuous Secure Performance Constraints fit for purpose fit for use Availability is not only about IT technology, but also about an organization 4 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Availability and other ITIL processes • Continual improvement – Service measurement & reporting – SLM • Strategy – Service Portfolio – Demand – Financial • Design – – – – 5 10/2/2020 Supplier Information security Service Catalogue Service level mgmt IT service continuity Availability Capacity O. Dulov - Progress towards a GGUS fail-safe system • Transition – – Knowledge Service validation & testing Release & deployment Service asset & configuration – Change mgmt • Operation – – – Event mgmt Request fulfilment Access mgmt Problem mgmt Incident mgmt Steinbuch Centre for Computing
Availability & Incident lifecycle Purpose: Minimum Downtime (or maximum Uptime) Alarm Point detection Incident N repair Alarm Point Uptime Restore Point Downtime recovery diagnosis Incident N+1 Time Between System Incidents 6 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Some High Availability (HA) topics Single Point of Failure & Redundancy Automatically & Manually switching Cluster: „Shared“ vs „shared nothing“ Dependability attributes Availability - readiness for correct service Reliability - continuity of correct service Safety - absence of catastrophic consequences on the user(s) and the environment Integrity - absence of improper system alteration Maintainability - ability for a process to undergo modifications and repairs 7 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Migration: GGUS to HA GGUS Hardware space Software space Purpose 8 10/2/2020 Httpd, Tomcat Remedy ARS Mysql DBMS connectors Oracle DBMS Maschine Cluster O. Dulov - Progress towards a GGUS fail-safe system Web Frontend HA Cluster Remedy ARS HA Cluster Mysql HA Cluster Oracle HA Cluster HA Platform Steinbuch Centre for Computing
Migration: GGUS to HA GGUS Phase 0. (Q 1 -Q 2 2011) Purpose: renew GGUS Development Environment Tasks Bring to the Vmware Platform with HA Support Improve server install/configuration procedure Improve release & deployment mgmt Separate logically/physically Web frontend and ARS Phase 1. (Q 2 2011) Purpose: renew GGUS Production Environment Tasks Bring to the Vmware Platform with HA Support Separate logically/physically Web frontend and ARS Improve & configure networking redundancy Integration into On Call Service 9 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Migration: GGUS to HA GGUS Phase 0/1 (cont. ) Real Application Cluster 10 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Migration: GGUS to HA GGUS Phase 2. (Q 3. 2011 – Q 2. 2012) Purpose: Implement auto-switching Focus on Service Continuity, Disaster Recovery Tasks Decide (What-if analysis) DBMS: Oracle as VM? Platform: KVM vs Vmware? Disaster Recovery Plan Manage VMs? … Choose, test the technology and design System Open source vs Commercial Adopt to the GGUS structure 11 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Migration: GGUS to HA GGUS. Phase 2. (cont. ) Real Application Cluster HA Platform 12 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Conclusion Availability not only technology or activity withing one ITIL Process, but also organizational Service improvements The GGUS software stack is relative complex and have some dependencies (mostly for commercial components) There are set of alternatives Different software packages for switching mechanizm & clustering With/without SAN connection for data With/without VM GGUS improvements way decompose Service into set of Subservices and put them into HA Environment with automatix for switching 13 10/2/2020 O. Dulov - Progress towards a GGUS fail-safe system Steinbuch Centre for Computing
Thank you for your attention! Steinbuch Centre for Computing (SCC) STEINBUCH CENTRE FOR COMPUTING - SCC KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www. kit. edu
- Slides: 14