TRACKING OF FAULTS AND FOLLOWUP Accelerator Fault Tracking

  • Slides: 24
Download presentation
TRACKING OF FAULTS AND FOLLOW-UP Accelerator Fault Tracking project Jakub Janczyk (TE-MPE-PE / BE-CO-DS)

TRACKING OF FAULTS AND FOLLOW-UP Accelerator Fault Tracking project Jakub Janczyk (TE-MPE-PE / BE-CO-DS) with input from: Andrea Apollonio, Chris Roderick, Rudiger Schmidt, Benjamin Todd, Daniel Wollmann

10/14/2014 R 2 E/Availability Workshop 2 Agenda • Purpose of fault tracking • What

10/14/2014 R 2 E/Availability Workshop 2 Agenda • Purpose of fault tracking • What has been done in the Past • Accelerator Fault Tracking project – plans & status • Summary

10/14/2014 R 2 E/Availability Workshop 3 Purpose of fault tracking Complete and consistent tracking

10/14/2014 R 2 E/Availability Workshop 3 Purpose of fault tracking Complete and consistent tracking allows to identify: • Problems as early as possible to allow for timely mitigation • Key issues which will limit performance of accelerators or equipment in the future (Run 2, Run 3, HL-LHC) • Increase availability, in both short- and long-term, by dealing with issues ASAP Track Faults in two areas: 1. Directly affecting accelerator operation – identify root causes (e. g. R 2 E effects, glitches in electrical network, etc. ) 2. Equipment (electronic) faults independently of immediate impact on accelerator operation

10/14/2014 R 2 E/Availability Workshop 4 What has been done in the Past •

10/14/2014 R 2 E/Availability Workshop 4 What has been done in the Past • A lot of different tools for logging of faults, used by different teams: • e. Logbook, Post-Mortem, Rad. WG page, tools in equipment groups (JIRA, Excel, Onenote, e. Logbook) • A lot of effort was required from individual teams/working groups to gather and exploit fault data • Nevertheless, difficult to get a consistent picture

ger g u r B it M. d e r C

ger g u r B it M. d e r C

10/14/2014 R 2 E/Availability Workshop 6 Cardiogram - „life” of LHC from operational point

10/14/2014 R 2 E/Availability Workshop 6 Cardiogram - „life” of LHC from operational point of view • Graphical analytic tool for combining data from different sources • Initially created by members of Availability WG: B. Todd, L. Ponce, A. Apollonio • Tedious work to gather and prepare all the necessary data several months for 2010 -2012 cardiogram

10/14/2014 7 R 2 E/Availability Workshop Accelerator Mode (Proton Physics, Ion Physics, etc. )

10/14/2014 7 R 2 E/Availability Workshop Accelerator Mode (Proton Physics, Ion Physics, etc. ) Cardiogram - example Access Fill Number Particle Momentum Beams Intensities Stable Beams PM Beam Dump Classification Fault Lines (Systems/ Fault Classifications) Fault G it AW d e r C

10/14/2014 R 2 E/Availability Workshop 8 Cardiogram – data preparation d Tod n i

10/14/2014 R 2 E/Availability Workshop 8 Cardiogram – data preparation d Tod n i njam e B t i Cred

10/14/2014 R 2 E/Availability Workshop 10 Accelerator Fault Tracking project Project launched February 2014

10/14/2014 R 2 E/Availability Workshop 10 Accelerator Fault Tracking project Project launched February 2014 (BE/CO, BE/OP, TE/MPE collaboration) Based on initial inputs from: • • Evian Workshops Availability Working Group Workshop on Machine Availability & Dependability for Post-LS 1 LHC BE/OP Goals: • Capture consistent and complete fault data • Facilitate fault tracking from perspective of all interested parties • • (OP, equipment groups, working groups) Single source of data – easier to complete, clean and analyse. Provide consistent / standardized statistics, analyses, reports for different users (8: 30 meetings, weekly reports / summaries) Interactive overview of faults (cardiogram on demand) Proactively identify incomplete data

Plans (as presented by Chris Roderick @ LMC 30 -04 -2014) Provide infrastructure to

Plans (as presented by Chris Roderick @ LMC 30 -04 -2014) Provide infrastructure to consistently & coherently capture, persist and make available accelerator fault data for further analysis. Foreseen project stages: 1. Put in place a fault tracking infrastructure to capture LHC fault data from an operational perspective Time • Enable data exploitation by others (e. g. AWG and OP) to identify areas to improve accelerator availability for physics We are here. . . • Ready before LHC beam commissioning • Infrastructure should already support capture of equipment group fault data, but not primary focus 2. Focus on equipment group fault data capture 3. Explore integration with other CERN data management systems (e. g. Infor EAM) • potential to perform deeper analyses of system and equipment availability • in turn - start predicting and improving dependability To support data analysis, AFT data extraction infrastructure should also provide data complimentary to the actual fault data - such as accelerator operational modes and states. Scope: Initial focus on LHC, but aim to provide a generic infrastructure capable of handling fault data of any CERN accelerator.

10/14/2014 R 2 E/Availability Workshop 12 Status • AFT is under development – Web

10/14/2014 R 2 E/Availability Workshop 12 Status • AFT is under development – Web application, available for different users, and integration with e. Logbook for LHC operators • Functionalities available from day 1 will be as planned for first stage of the project • AFT test version available • We’re open to start discussion with equipment groups acc-fault-tracking-team@cern. ch

10/14/2014 R 2 E/Availability Workshop 13

10/14/2014 R 2 E/Availability Workshop 13

10/14/2014 R 2 E/Availability Workshop 14

10/14/2014 R 2 E/Availability Workshop 14

10/14/2014 R 2 E/Availability Workshop 15

10/14/2014 R 2 E/Availability Workshop 15

10/14/2014 R 2 E/Availability Workshop Turnaround Time 16

10/14/2014 R 2 E/Availability Workshop Turnaround Time 16

10/14/2014 R 2 E/Availability Workshop 17 Summary • Consistent and complete tracking of faults

10/14/2014 R 2 E/Availability Workshop 17 Summary • Consistent and complete tracking of faults is the key to identify and efficiently mitigate issues • The AFT will ease the recording of faults and their root causes in a complete and consistent way • Run 2 data will be essential to identify future performance/availability limitations towards HL-LHC • Quality and completeness of the data requires effort from all involved parties • Open to discuss integration of equipment groups data

10/14/2014 R 2 E/Availability Workshop Questions 18

10/14/2014 R 2 E/Availability Workshop Questions 18

10/14/2014 R 2 E/Availability Workshop Extra Slides 19

10/14/2014 R 2 E/Availability Workshop Extra Slides 19

10/14/2014 R 2 E/Availability Workshop Roles and simplified workflow 20

10/14/2014 R 2 E/Availability Workshop Roles and simplified workflow 20

10/14/2014 2010 2011 2012 R 2 E/Availability Workshop 21

10/14/2014 2010 2011 2012 R 2 E/Availability Workshop 21

10/14/2014 22 R 2 E/Availability Workshop Multiple failures • It is easy to see

10/14/2014 22 R 2 E/Availability Workshop Multiple failures • It is easy to see if there are multiple failures at the same time, but it’s not obvious if they are related. • One of the goal of AFT project is to capture data that will allow to show the relations between faults. Water leak Problems caused by water leak Faults not related – QPS failed and rest of them are accesses in shadow Faults related

10/14/2014 R 2 E/Availability Workshop 23 Access without faults • In 2012, around 40

10/14/2014 R 2 E/Availability Workshop 23 Access without faults • In 2012, around 40 times there was access without any fault • The reasons for these accesses are not classified, but often something is repaired • Inconsistent data – cardiogram allows to spot this

10/14/2014 R 2 E/Availability Workshop Access without faults - examples Few accesses: ATLAS, Change

10/14/2014 R 2 E/Availability Workshop Access without faults - examples Few accesses: ATLAS, Change of PC, repair of QPS, intervention on the crates of the BPMD LHCb – fixing muon detectors ATLAS access Accesses in shadow of QPS fail: QPS – reset cards, ALICE and CMS, Cryogenics – valve regulation, RF – replacing broken attenuator 24