Introduction to the new mainframe LargeScale Commercial Computing

  • Slides: 32
Download presentation
Introduction to the new mainframe: Large-Scale Commercial Computing Chapter 5: Availability © Copyright IBM

Introduction to the new mainframe: Large-Scale Commercial Computing Chapter 5: Availability © Copyright IBM Corp. , 2006. All rights reserved. Click to add text

Introduction to the new mainframe Objectives The ability to: • Understand what availability means

Introduction to the new mainframe Objectives The ability to: • Understand what availability means to a commercial enterprise • Describe the inhibitors to availability • Describe operating system facilities that improve availability • Describe the major components of Parallel Sysplex © Copyright IBM Corp. , 2006. All rights reserved. 2

Introduction to the new mainframe A real customer requirement: Royal Bank Boosts Availability -

Introduction to the new mainframe A real customer requirement: Royal Bank Boosts Availability - Online Banking IBM System z Parallel Sysplex System Front End - Internet • Web. Sphere MQ For z/OS, V 5. 3 Challenge: Maximize Availability Back End - Data/Applications • DB 2 Database • IMS Database • CICS Applications 12 million customers 2. 5 million online 60, 000 employees Benefits Reliable integration with internet Supports ~40 web-based applications Efficient use of parallel sysplex Improved customer availability © Copyright IBM Corp. , 2006. All rights reserved. 3

Introduction to the new mainframe Client Server Architecture 2 tier and 3 tier Architecture

Introduction to the new mainframe Client Server Architecture 2 tier and 3 tier Architecture thin vs thick client maintenance and change issues Microsoft vs IBM © Copyright IBM Corp. , 2006. All rights reserved. 4

Introduction to the new mainframe What is availability? Availability is the state of an

Introduction to the new mainframe What is availability? Availability is the state of an application being accessible to the end user. e. g. 13 years without a visible customer outage © Copyright IBM Corp. , 2006. All rights reserved. 5

Introduction to the new mainframe Definitions: High availability: The infrastructure (or applications) cannot undergo

Introduction to the new mainframe Definitions: High availability: The infrastructure (or applications) cannot undergo an unplanned outage for more than a few seconds or minutes without serious impact the business. Acceptable to bring down the application for a few hours for scheduled maintenance. Continuous availability: The infrastructure and applications cannot be interrupted at all. No allowance for any outage, either unplanned or planned. 999% availability- just over 5 minutes per year of all outages in total. © Copyright IBM Corp. , 2006. All rights reserved. 6

Introduction to the new mainframe Introduction to availability High Availability Continuous Operations Fault-tolerant, failureresistant

Introduction to the new mainframe Introduction to availability High Availability Continuous Operations Fault-tolerant, failureresistant infrastructure supporting continuous application processing Protection of critical business data Recovery is predictable and reliable Non-disruptive backups and system maintenance coupled with continuous availability of applications Disaster Recovery Protection against unplanned outages such as disasters through reliable, Operations continue after a disaster predictable recovery Costs are predictable and manageable © Copyright IBM Corp. , 2006. All rights reserved. 7

Introduction to the new mainframe Outage Definition An outage (unavailability) is the time, a

Introduction to the new mainframe Outage Definition An outage (unavailability) is the time, a system is not available to an end user. Outages may be planned or unexpected (unplanned). -Planned outages include causes like data base reorganisation, release changes, and network reconfiguration. -Unplanned outages are caused by some kind of a hardware, software or data problem While planned outages can be scheduled, they still are disruptive. The modern trend is to try to avoid planned outages altogether. This requires extensive hardware and software facilities. © Copyright IBM Corp. , 2006. All rights reserved. 8

Introduction to the new mainframe Cost of outages (1) Financial Impact of Downtime Per

Introduction to the new mainframe Cost of outages (1) Financial Impact of Downtime Per Hour (by various Industries) Source: Contingency Planning Research & Strategic Research Corp. © Copyright IBM Corp. , 2006. All rights reserved. 9

Introduction to the new mainframe Cost of outages (2) © Copyright IBM Corp. ,

Introduction to the new mainframe Cost of outages (2) © Copyright IBM Corp. , 2006. All rights reserved. 10

Introduction to the new mainframe Types of Outages Common Causes for “Application Downtime” Source:

Introduction to the new mainframe Types of Outages Common Causes for “Application Downtime” Source: Standish Group Research © Copyright IBM Corp. , 2006. All rights reserved. 11

Introduction to the new mainframe Inhibitors to availability Number of 9 s – or

Introduction to the new mainframe Inhibitors to availability Number of 9 s – or the Myth of the nines Class of 9 s Outage 99, 999 % 5 min / year 99, 99 % 53 min / year 99, 9 % 8, 8 hrs / year 99 % 88 hrs / year 90 % 876 hrs / year Example Continous Availability z/OS Parallel Sysplex Fault Tolerant S/390 Parallel Sysplex High Availability Single IBM System z CPC General Purpose High available UNIX Cluster Campus LAN © Copyright IBM Corp. , 2006. All rights reserved. 12

Introduction to the new mainframe Redundancy Hardware – IBM Mainframe • Power 2 x

Introduction to the new mainframe Redundancy Hardware – IBM Mainframe • Power 2 x Power Supply 2 x Power feed • Internal Battery Feature Optional internal battery in cause of loss of external power) • Cooling • Dynamic oscillator switchover • Processors Multiprocessors Spare Processing units • Memory Chip sparing Error Correction and Checking …. Distance concept and codes © Copyright IBM Corp. , 2006. All rights reserved. 13

Introduction to the new mainframe Concurrent Maintenance and Upgrades – fewer outages • Duplex

Introduction to the new mainframe Concurrent Maintenance and Upgrades – fewer outages • Duplex Units Power Supplies, • • Concurrent Microcode (Firmware) updates Hot Pluggable I/O e. g. Stratus Comp co. PU Conversion Permanent and Temporary Capacity Upgrades Capacity Upgrade on Demand (CUo. D) Customer Initiated Upgrade (CIU) On/Off Capacity on Demand (On/Off Co. D) • Capacity Back. Up (CBU) © Copyright IBM Corp. , 2006. All rights reserved. 14

Introduction to the new mainframe Capacity Back. Up (CBU) Who Needs It? • Any

Introduction to the new mainframe Capacity Back. Up (CBU) Who Needs It? • Any business with a requirement for increased availability or Disaster Recovery What Is It? • The ability to nondisruptively increment capacity temporarily, • Dual Microcode Loads Provide two machine configurations in one box • Take advantage of "spare" PUs • Significant cost savings possible Standby MIPS cost can be eliminated IBM Software license charges on standby MIPS can be eliminated CBU Server Production Se • Configure memory and channels to support production workload How Can I Use It? • Adjacent machines in the same location • Multiple images in the same Parallel Sysplex® cluster • Backup/Recovery site © Copyright IBM Corp. , 2006. All rights reserved. 15

Introduction to the new mainframe EBR (E. . . Backup Restore)- Dynamic Memory Move

Introduction to the new mainframe EBR (E. . . Backup Restore)- Dynamic Memory Move Example: • The Dynamic Memory Move operation concurrently changes the physical memory backing of an absolute storage increment • Performed transparent to the Operating System • Utilizes the z. Series Copy/Reassign Hardware • Used during EBA to: Absolute storage increment “ 123” is concurrently moved from physical memory increment 1 to physical memory increment 2. Absolute Storage Space Move physical memory usage from the targeted book to books that will be remaining in the system. Optimize memory allocation after EBA completion. Physical Memory 123 EBA = Enhanced Book Availability 1 2 © Copyright IBM Corp. , 2006. All rights reserved. 16

Introduction to the new mainframe EBR - Redundant I/O Interconnect (RII) STI Multipath Module

Introduction to the new mainframe EBR - Redundant I/O Interconnect (RII) STI Multipath Module (STI-MP) • A multiplexer that supports attachment to four I/O features in an I/O domain and has an alternate path to a second STI-MP for a redundant I/O infrastructure. Key Usage • Memory Upgrade • Dynamic MBA fanout error recovery • Reduction of UIRA outage • Book Repair • STI cable repair • MBA fanout card repair • On book add MBA fanouts used for I/O are concurrently STI from Book 1 Processor Book 1 Memory Cards L 2 Cache PU PU PU PU 8 MBA Fanout I/O Cage Ring Structure PU PU 8 MBA Fanout 16 STIs rebalanced to the new book STI from Book 0 Processor Book 0 16 STIs STI 2. 7 GB/sec ICB-4 2 GB/sec STI daughte r card STI mothe r card I/O Ports I/O features STI-MP & STI-A 8 Cards FICON Express 2 I/O Feature I/O Ports I/O Cage OSA-Express 2 © Copyright IBM Corp. , 2006. All rights reserved. 17

Introduction to the new mainframe EBR - Concurrent Physical Processor Reassignment • This operation

Introduction to the new mainframe EBR - Concurrent Physical Processor Reassignment • This operation is used for concurrently changing the physical backing of one or more logical processors • The state of source operating physical processor is captured and transplanted into the target physical processor. • Expected to be transparent to the operating system. • Utilizes the PU sparing function • Used during EBA to: Logical PU 6 Physical PUx PUy Move processors from the targeted book to spare processors on a book remaining in the system Rebalance processors after EBA completion. © Copyright IBM Corp. , 2006. All rights reserved. 18

Introduction to the new mainframe Create a redundant I/O configuration LPAR 2 LPAR 1

Introduction to the new mainframe Create a redundant I/O configuration LPAR 2 LPAR 1 LPARn LPAR 2 LPAR 1 CSS / CHPID Director (Switch) DASD CU. . © Copyright IBM Corp. , 2006. All rights reserved. 19

Introduction to the new mainframe RAS Features of an Storage Subsystem • • •

Introduction to the new mainframe RAS Features of an Storage Subsystem • • • • Independent dual power feeds N+1 power supply technology/hot swappable power supplies, fans N+1 cooling Battery backup Non-Volatile Subsystem cache, to protect writes that have not been hardened to DASD yet Nondisruptive maintenance Concurrent Licensed Internal Code (LIC) activation Concurrent repair and replace actions RAID architecture Redundant microprocessors and data paths Concurrent upgrade support (that is, ability to add disks while subsystem is online) Redundant shared memory Spare disk drives Remote Copy to a second storage subsystem Synchronous (Peer to Peer Remote Copy, PPRC) Asynchronous (Extended Remote Copy, XRC) © Copyright IBM Corp. , 2006. All rights reserved. 20

Introduction to the new mainframe Disk Mirroring using PPRC and XRC Peer to Peer

Introduction to the new mainframe Disk Mirroring using PPRC and XRC Peer to Peer Remote Copy(PPRC) -Metro Mirror • Synchronous remote data mirroring Application receives “I/O complete” when both primary and secondary disks are updated • Typically supports metropolitan distance • Performance impact must be considered Latency of 10 km Extended Remote Copy(XRC) -z/OS Global Mirror Asynchronous remote data mirroring Application receives “I/O complete” as soon as primary disk is updated Unlimited distance support Performance impact negligible System Data Mover (SDM) provides Data consistency of secondary data Central point of control XRC PPRC SDM System z z/OS 1 1 4 2 3 © Copyright IBM Corp. , 2006. All rights reserved. 21

Introduction to the new mainframe PPRC Failover / Failback (FO/FB) • The new primary

Introduction to the new mainframe PPRC Failover / Failback (FO/FB) • The new primary volumes (at the remote site) records changes while in failover mode. • The original mode of the volumes at the local site is preserved as it was when the failover was initiated. • Only need to resynchronize from time of failover, not entire data set Normal Application I/Os A B Sync PPRC Failover Failback Start Application I/Os A Failback Finish Application I/Os A B Sync PPRC (suspended) C R Application I/Os A A B Sync PPRC (full duplex) O O S © Copyright IBM Corp. , 2006. All rights reserved. B B Sync PPRC (full duplex) 22

Introduction to the new mainframe Parallel Sysplex • Removes Single Point of Failure •

Introduction to the new mainframe Parallel Sysplex • Removes Single Point of Failure • • • Server LPAR Subsystems Planned and Unplanned Outages Single System Image Dynamic Session Balancing Dynamic Transaction Routing Highlights Data sharing Locking Cross-system workload dispatching Synchronization of time for logging, etc. Coupling Facility Sysplex Timer – TOD clock synchronization Workload Manager in z/OS Compatibility and exploitation in software subsystems, like Data. Sharing in Database systems IBM System z • Hardware/software combination © Copyright IBM Corp. , 2006. All rights reserved. 23

Introduction to the new mainframe z/OS factors to availability -Workload Balancing using Workload Manager

Introduction to the new mainframe z/OS factors to availability -Workload Balancing using Workload Manager (WLM) -Capability to restart applications using the Automatic Restart Manager (ARM) without interfering Assists Two-Phase commits using Resource Recovery Services (RRS) Make dynamicly changes to your system configuration using the System Modification Program Extended (SMP/E) © Copyright IBM Corp. , 2006. All rights reserved. 24

Introduction to the new mainframe Error recording and error recovery routines © Copyright IBM

Introduction to the new mainframe Error recording and error recovery routines © Copyright IBM Corp. , 2006. All rights reserved. 25

Introduction to the new mainframe z/OS Recovery features • Recovery Termination Manager (RTM) •

Introduction to the new mainframe z/OS Recovery features • Recovery Termination Manager (RTM) • Extended Specify Task Abnormal Exit (ESTAE) • Functional Recovery Routine (FRR) © Copyright IBM Corp. , 2006. All rights reserved. 26

Introduction to the new mainframe The Human Factor …. Automation: critical for successful rapid

Introduction to the new mainframe The Human Factor …. Automation: critical for successful rapid recovery and continuity The More People Involved…. . The Higher the Odds of Human Errors. The benefits of automation: • Allows business processes to be built on a reliable, consistent recovery time • Recovery times can remain consistent as the system scales to provide a flexible solution designed to meet changing business needs • Reduce infrastructure management cost and staffing skills • Reduces or eliminates human error during the recovery process at time of disaster • Facilitates regular testing to help ensure repeatable, reliable, scalable business continuity • Helps maintain recovery readiness by managing and monitoring the server, data replication, workload and the network along with the notification of events that occur within the environment © Copyright IBM Corp. , 2006. All rights reserved. 27

Introduction to the new mainframe Today’s Business Require Rapid Database Availability Achieve Application and

Introduction to the new mainframe Today’s Business Require Rapid Database Availability Achieve Application and Database Restart • Consistent, repeatable, fast • Database Restart: To start a database application following an outage without having to restore the database This is a process measured in minutes Avoid Application and Database Recovery • Unpredictable recovery time, usually very long and very labor intensive • Database Recovery: Restore last set of Image Copy tapes and apply log changes to bring database up to point of failure This is a process measured in hours or even days!!! © Copyright IBM Corp. , 2006. All rights reserved. 28

Introduction to the new mainframe NETWORK What is GDPS/PPRC? (Metro Mirror) 11 11 12

Introduction to the new mainframe NETWORK What is GDPS/PPRC? (Metro Mirror) 11 11 12 1 2 3 4 8 7 6 5 SITE 1 2 3 4 7 6 5 SITE 2 NETWORK 10 1 8 km 100 9 12 10 9 Multi-site base or Parallel Sysplex environment Remote data mirroring using PPRC Manages unplanned reconfigurations • z/OS, CF, disk, tape, site • Designed to maintain data consistency and integrity across all volumes • Supports fast, automated site failover • No or limited data loss - (customer business policies) Single point of control for • Standard actions Stop, Remove, IPL system(s) • Parallel Sysplex Configuration management • User defined script (e. g. Planned Site Switch) • PPRC Configuration management © Copyright IBM Corp. , 2006. All rights reserved. 29

Introduction to the new mainframe Multiple Site Workload - Cross-site Sysplex Continuous Availability Configuration

Introduction to the new mainframe Multiple Site Workload - Cross-site Sysplex Continuous Availability Configuration SITE 1 11 CF 1 12 1 10 11 2 8 6 1 2 8 4 7 5 P 2 6 PROD 5 P PROD CBU P 4 P 3 K 1 K 2 K/L CF 2 3 9 4 7 12 10 3 9 P 1 SITE 2 P P P S S K/L © Copyright IBM Corp. , 2006. All rights reserved. 30

Introduction to the new mainframe Continuous Availability and Disaster Recovery at unlimited distance (GDPS/PPRC

Introduction to the new mainframe Continuous Availability and Disaster Recovery at unlimited distance (GDPS/PPRC & GDPS/XRC) IBM System z Solution Production Site 1 metropolitan distance Site 2 unlimited distance Site 3 CF CF Parallel Sysplex FICON™ or ESCON CF P' PPRC secondary GDPS/ PPRC CF PX GDPS/XRC PPRC primary XRC primary Continuous Availability GDPS PPRC or GDPS/PPRC HM Designed to provide continuous availability and no data loss between sites 1 and 2 Sites 1 and 2 can be same building or campus distance to minimize performance impact X' XRC secondary Disaster/Recovery Production site 1 failure • Site 3 can recover with no data loss in most instance Site 2 failure • Production can continue with site 1 data (P') Site 1 and 2 failure • SIte 3 can recover with minimal loss of data © Copyright IBM Corp. , 2006. All rights reserved. 31

Introduction to the new mainframe SUMMARY © Copyright IBM Corp. , 2006. All rights

Introduction to the new mainframe SUMMARY © Copyright IBM Corp. , 2006. All rights reserved. 32