Fair Availability Calculation James Casey Rajesh Kalmady CERN
Fair Availability Calculation James Casey, Rajesh Kalmady CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it with contributions from RSVSAM and Grid. View mailing list members
Current Issues • Complaints about availability when a lot of UNKNOWN – E. g. February VO tests for T 1 • Differences between raw numbers on portal and report numbers – ‘Hack’ for UNKOWN < AVAIL only applied on reports • Proposal Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – Assume that the uptime during the "unknown" time has the same behavior as during the known time" – Apply same logic in reports and portal 2
Availability • Current Availability – Avail = Uptime/(total time) • Proposed Availability – Avail = Uptime/(Uptime+Downtime+Scheduled. Down) … or (equivalently) … – Avail = Uptime / (1 - unknown) • Similarly Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – Down = downtime/(1 -unknown) – Scheduled_down = Scheduled_downtime/(1 unknown) ‘Availability a. k. a the site uptime over the whole month, assuming that the uptime during the measured period is representative’ (Jeff) 3
Example • In a day if the site is UP for 12 hours, down for 3 hours, under maintenance for 4 hours and unknown for 5 hours – Known period = 24 - 5 = 19 hours = 0. 79 – Availability is 63% over the measured period (19 hours) The measured period is 79% of the reporting period (24 hours) OLD Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it NEW Availability 12/24 0. 500 12/19 0. 632 Down 3/24 0. 125 3/19 0. 158 Scheduled Down 4/24 0. 167 4/19 0. 210 Unknown 5/24 0. 208 Total 1. 000 4
Notes - 1 • There is a possibility now of having an undefined availability – the denominator in the above equation could be zero • Example 1: – If the status is 'Unknown' for a whole hour, then 'unknown‘ will be 1 – So availability will be 0/0 which is undefined – Earlier it used to be zero Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Example 2: – If the site is scheduled down for the entire known period, the availability for that period is 0 5 – No change from earlier algorithm
Notes - 2 • If we have no results during a scheduled downtime, period is marked as SCHEDULED_DOWN – we knew what it should be even if we got no metrics • Known Interval (Measured Period - the quantity 1 - UNKNOWN) should be quoted with Availability, Reliability numbers to indicate how good the numbers are Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – “ 99% available with results known for 80% of the period. ” • Reliability numbers remained unchanged 6
Conclusions • Benefits: – Simpler – Fairer • your total availability relies on how you’ve done during measurable periods – Consistent • Drawbacks – Different, all corner cases handled ? , … , ? ? • If general approach is ok, we’ll: Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – produce a new draft of the availability calculation procedure for comments – Modify tools, starting with the reporting tool 7
- Slides: 7