Availability Studies and Operating Cycle Odei Rey Orozco
Availability Studies and Operating Cycle Odei Rey Orozco
Goal of the study Demonstrate that CLIC availability requirements can be reached Ø Identify the key factors that influence on failure effects Ø Analyse possible operational scenarios and machine designs Ø Luminosity production as a function of availability Bottom-up approach: Availability models 06/11/2020 Top-down approach: Availability allocation Methodology 3
CLIC Availability models CONCEPT OF THE COMMON INPUT FORMAT Generates properly structured input files for the target simulation software Isograph Availability Workbench® Avail. Sim Software Avail. Sim 3. 0 being developed at CERN in collaboration with ESS. + accelerator operation driven functionalities + Modelling and running simulations in various software packages at the same time + Model definition once, avoids repetition + Model and results validation + Easy versioning of models 06/11/2020 4
CLIC Availability models Output -> Availability estimations • • • How often do we expect CLIC to fail? How much time would we need to clean the system faults? Which are the systems contributing more to the failures? And to the fault time? Do we need to implement more redundancies? … On-going studies Ø Main LINAC and Drive Beam LINAC RF powering systems Ø RTML and transfer lines Ø Technical Infrastructures, cooling and ventilation 06/11/2020 5
CLIC Availability models Main Linac Klystron based RF Powering System RF POWERING SYSTEM Restart time = 8 h Access time = 8 h Each element can fail with a MTTF and then can be fixed with MTTR One can define the consequence of the failure Fix offline. Need to be exchanged/repaired in next shutdown Total of 1500 units (per linac) 150 hot spares RF MODULE Restart time = 8 h Access time = 8 h 06/11/2020 6
CLIC Availability models Main Linac Klystron based RF Powering System Assumptions q Simulation period: 1 year ( operation 24/7) q Components failure behaviour follow an exponential distribution q 150 hot standby spares available every time operation (re)starts q Maintenance/ repairs: § Only repairs when the system is down due to components failures * § Repairs can be done simultaneously § All repairs must be finished before restating the system 06/11/2020 7
CLIC Availability models Main Linac Klystron based RF Powering System Availability Times Down Uptime (days) Downtime (days) Standard deviation MTTR (h) MTBF (h) 72. 5% 87. 2 263. 7 101. 3 0. 002 27. 87 100. 5 Downtime contribution RF Powering System 56% LINAC Module 44% 06/11/2020 8
CLIC Availability models Main Linac Drive Beam based RF Powering System RF POWERING SYSTEM Restart time = 1 h Access time = 8 h Each element can fail with a MTTF and then can be fixed with MTTR One can define the consequence of the failure Fix offline. Need to be exchanged/repaired in next shutdown Total of 500 units (per linac) 50 hot spares DB LINAC MODULE Does not include the Drive Beam accelerator (TBD) Restart time = 1 h Access time = 8 h 06/11/2020 9
CLIC Availability models Main Linac Drive Beam based RF Powering System Assumptions q Simulation period: 1 year ( operation 24/7) q Components failure behaviour follow an exponential distribution q 50 hot standby spares available every time operation (re)starts q Maintenance/ repairs: § Only repairs when the system is down due to components failures * § Repairs can be done simultaneously § All repairs must be finished before restating the system 06/11/2020 10
CLIC Availability models Main Linac Main Beam based RF Powering System Availability Times Down Uptime (days) Downtime (days) Standard deviation MTTR (h) MTBF (h) 97. 6% 14. 5 355. 9 9. 1 0. 001 15 604. 1 Downtime contribution RF Powering System 39% DB LINAC module 61% 06/11/2020 11
CLIC Availability models Main Linac RF Powering Systems Conclusions Accuracy of the output results strongly depends on the quality of the input • Only RF powering schemes compared, does not include other components Klystron based powering • The components in greater number (klystrons, loads) are governing the system availability • The powering system could operate for around 100 hours before running out of spares Drive Beam based powering • Components with high failure frequency and repair time (cooling system, loads) are governing the system availability • The powering system could operate around 600 hours before running out of spares 06/11/2020 12
CLIC Availability models Infrastructure and technical services Status • Hardware description completed • Failure modes analysis on-going Next steps… Ø Organize meetings with system experts to gather data on the failure modes Ø Run first simulations Ø Gradually increase complexity of the model by adding access-times by location, dependencies among systems, etc. 06/11/2020 13
CLIC Availability models RTML and Damping Rings -1* <STR> [m] [Ge. V] [m^-1] [m^-2] [m^-1] [rad] S L E 0 K 1 L K 1 SL K 2 SL E 1 sub region (guessed) KEYWORD 1 SRmatch. IN BPM 0. 0 0 2. 86 0 0 0 0 -- 2 SRmatch. IN QUADRUPOLE 0. 2 0. 15 2. 86 0. 101521 0 0 0 0 Q_0. 101521 3 DIQ-SRmatch. IN-1 SRmatch. IN KICKER 0. 2 0 2. 86 0 0 0 0 -- 4 SRmatch. IN BPM 3. 2 0 2. 86 0 0 0 0 -- 5 SRmatch. IN QUADRUPOLE 3. 5 0. 3 2. 86 -0. 247039 0 0 0 0 Q_0. 247039 6 DIQ-SRmatch. IN-2 SRmatch. IN KICKER 3. 5 0 2. 86 0 0 0 0 -- 7 SRmatch. IN BPM 6. 5 0 2. 86 0 0 0 0 -- 8 SRmatch. IN QUADRUPOLE 6. 8 0. 3 2. 86 0. 203042 0 0 0 0 Q_0. 203042 9 DIQ-SRmatch. IN-3 SRmatch. IN KICKER 6. 8 0 2. 86 0 0 0 0 -- 10 SRmatch. IN BPM 9. 8 0 2. 86 0 0 0 0 -- 11 SRmatch. IN QUADRUPOLE 10. 1 0. 3 2. 86 -0. 247039 0 0 0 0 Q_0. 247039 12 DIQ-SRmatch. IN-4 SRmatch. IN KICKER 10. 1 0 2. 86 0 0 0 0 -- 13 SRmatch. IN BPM 13. 1 0 2. 86 0 0 0 0 -- 14 SRmatch. IN QUADRUPOLE 13. 4 0. 3 2. 86 0. 203042 0 0 0 0 Q_0. 203042 15 DIQ-SRmatch. IN-5 SRmatch. IN KICKER 13. 4 0 2. 86 0 0 0 0 -- 16 SRdiag BPM 16. 4 0 2. 86 0 0 0 0 -- 17 SRdiag QUADRUPOLE 16. 7 0. 3 2. 86 -0. 247039 0 0 0 0 Q_0. 247039 18 DIQ-SRdiag-1 -2 SRdiag KICKER 16. 7 0 2. 86 0 0 0 0 -- 19 SRdiag BPM 19. 7 0 2. 86 0 0 0 0 -- 20 SRdiag QUADRUPOLE 20. 0 0. 3 2. 86 0. 203042 0 0 0 0 Q_0. 203042 21 DIQ-SRdiag-1 -2 SRdiag KICKER 20. 0 0 2. 86 0 0 0 0 -- 22 SRdiag BPM 23. 0 0 2. 86 0 0 0 0 -- 23 SRdiag QUADRUPOLE 23. 3 0. 3 2. 86 -0. 247039 0 0 0 0 Q_0. 247039 24 DIQ-SRdiag-2 -2 SRdiag KICKER 23. 3 0 2. 86 0 0 0 0 -- 25 SRdiag BPM 26. 3 0 2. 86 0 0 0 0 -- 26 SRdiag QUADRUPOLE 26. 6 0. 3 2. 86 0. 203042 0 0 0 0 Q_0. 203042 27 DIQ-SRdiag-2 -2 SRdiag KICKER 26. 6 0 2. 86 0 0 0 0 -- 28 SRdiag BPM 29. 6 0 2. 86 0 0 0 0 -- 06/11/2020 K 2 L KS ANGLE [rad] 0* NAME E 2 proto strength parameter Hardware description exported from the RTML and Damping Rings optics files RTML: More than 4000 lines 14
CLIC Availability models Infrastructure and technical services Frist N sub region Elem (guessed) sub region (guessed and completed) S-start SBEND QUADRUPOLE SEXTUPOLE KICKER BPM CAVITY SOLENOID total comments SBEND families QUADRUPOLE families SEXTUPOLE families SOL families --1 x 6 D ---1 x 4 D -----1 x 149 D -- strength CA 1 x 6 D -- strength CA 1 x 1 Q, 2 x 2 Q 4 x 1 Q, 1 x 14 Q, 1 X 15 Q 8 x 1 Q, 2 x 3 Q, 2 x 8 Q 5 x 1 Q 2 x 2 Q -- (2 x 2 Q strength 120) 4 x 1 Q -5 x 1 Q 1 x 14 Q, 1 x 15 Q -- 1 x 15 Q strength Srdiag 5 x 1 Q 2 x 34 Q -- all need trimmers 6 x 2 h. Q, 1 x 10(4+6)h. Q 3 x 60 Q, 1 x 29 Q -- trimmers for e-loss? 3 x 2 Q, 1 x 1 Q -- (3 x 2 Q strength CA 1 x. S 2 x 30 S, 1 x 60 S 4 X 1 2 x 2 SOL 1 x 1 Q, 4 x 2 Q, 2 x 4 Q, 1 x 7 Q, 1 x 8 Q + 6 x 2 h. Q, 1 x 10(4+6)h. Q -- mixture of half and whole quads, need to split region 1 X 4 1 16 115 216 231 263 275 279 294 381 397 877 943 1839 15 SRmatch. IN 99 SRdiag 101 SR 15 SRmatch. OUT 32060 12 BC 1 match 1 4080 15 BC 1 match 2 87 BC 1 diag 16 BC 1 match. OUT 480120 66 Boomatch 896 CA 31140 0010_match_dr_to_rtml 0030_dump_and_match_diag_to_sr 0040_spin_rotator 0050_match_sr_to_bc 1_rf 0060_bc 1_rf 0070_match_bc 1_rf_to_chicane 0080_bc 1_chicane 0090_match_bc 1_to_diag 0100_diagnostics_2 0110_dump_and_match_diag_to_booster 0120_booster_linac 0130_dump_and_match_booster_to_ca 0140_central_arc 0 16. 35 124. 45 254. 85 267. 65 310. 05 338. 65 367. 73 385. 03 480. 73 493. 58 1031. 8 1132. 8 2084. 7 0 0 6 0 0 0 4 0 0 0 149 6 5 33 30 5 4 4 0 5 29 5 68 22 209 7 0 0 1 0 0 0 0 0 120 4 5 33 30 5 4 4 0 5 29 5 68 22 209 7 5 33 30 5 4 4 0 5 29 6 68 22 209 7 0 0 20 0 0 276 0 0 0 4 0 0 0 15 99 101 15 32 12 4 15 87 16 480 664 FF 1 BPM 896 31 1870 174150 0150_vertical_transfer 2121. 3 6 54 4 57 54 0 0 1754 FF 1 kicker 1 x 2 D, 1 x 4 D 2044 55160 0160_match_vt_to_ltl 3668. 1 5 15 4 15 15 0 0 54 1 x 5 D -- strength CA 9 x 1 Q, 3 x 2 Q -- (3 x 2 Q strength CA 4 X 1 -- same strength as 140 2099 72170 0170_long_transfer_line 3820. 4 0 24 24 0 0 72 -- 1 x 24 Q 2171 72180 0180_dump_and_match_ltl_to_tal 9078. 3 4 30 0 30 8 0 0 724 FF 2 BPM 1 x 4 D 4 x 2 h. Q, 3 x 4 h. Q, 1 x 10(4+6)h. Q-- 10 h. Q strength Boomatch 1 X 50 D -- strength CA 3 x 20, 1 x 9 -- strength CA -- trimmers for e- 2 x 20 S -- strength loss? CA 2243 0190_turn_around_loop 9265. 1 50 69 40 69 69 0 0 297 0190_turn_around_loop 9580. 4 0 21 21 0 0 63 -- 3 x 2 Q, 7 x 1 Q, 1 x 8 Q 2603 1197 right 0190_turn_around_loop 9934 200 279 160 279 0 0 1197 1 X 200 D -- strength CA 1 x 39 Q, 3 x 80 Q -- strength CA -- trimmers for 4 x 40 S -- strength e-loss? CA 3800 0200_match_tal_to_bc 2_rf 11209 0 22 22 0 0 66 -- 6 x 2 h. Q, 1 x 10(4+6)h. Q -- 1 x 10 h. Q strength 150 0210_bc 2_rf 0220_match_bc 2_rf_to_chicane_1 0230_bc 2_chicane_1 0240_match_bc 2_chicanes 0250_bc 2_chicane_2 0260_match_bc 2_to_diag 0270_diagnostics_3 11238 11265 11280 11310 11326 11355 11368 0 0 4 0 0 12 4 0 4 8 0 0 0 0 12 4 0 4 0 4 8 78 0 0 0 0 114 12 24 --1 x 4 D --- 1 x 12 Q 4 x 1 Q -4 x 1 Q 1 x 8 Q 0280_dump_and_match_rtml_to_main_linac 11450 0 6 7 0 0 19 -- 1 x 0 Q, 5 x 1 Q 11467 0 438 0 978 0 333 0 981 0 958 0 374 2540 297 left 63 middle 66 match 3866 3980 3992 3996 4008 4012 4024 114210 12220 4230 12240 4250 12260 24270 4048 19280 4067 ml total 06/11/2020 0 0 main linac 4 4066 Post processing: Count by region and Component type Magnets of same strength powered together? Next steps… • Organize meetings with magnet experts to see witch type of magnets to be used (based on magnet strength) and then with power-converter experts for powering schemes. • Define how we can determine redundancies in BPMs and correctors. • Failure modes analysis 15
Availability allocation by complexity criteria Output -> Availability requirements • 06/11/2020 16
How to measure complexity? Availability allocation methods Availability requirements Experts evaluation Determine scales of factors and the effects between systems • • • Availability target No. of components Repair time Criticality State of art Performance time Environment In line with the estimations from the Availability models? 06/11/2020 17
Availability allocation by complexity criteria Unavailability requirements per subsystem (Target unavailability =20%*) 8. 00% 7. 00% Unavailability [%] 6. 00% 5. 66% 5. 00% 4. 00% 3. 22% 2. 96% 2. 60% 3. 00% 2. 28% 1. 75% 2. 00% 1. 11% 0. 95% 0. 57% 1. 00% 0. 37% 0. 18% 0. 12% 0. 10% 0. 09% FOO Technique + DEMATEL 06/11/2020 Average-Weight-Geometric m te ys te s ls tro on C an d S af et Te y ch ni Po st ca l A d ec la rm s el e ys ra rlo te ne In m to rs s or M ac al ni c ch Te hi N et w n li io lis ol -c st Po ck k ne ps TA d er L in ss ce Ac Lo n g Tr an sf Be am L oo TM ra T es a n ns po rt - R je ct am In Be e riv D L or s te m er y iv d el am Be bi n om R ec ys s om n C at io g C R in ng pi D am pl ex x pl e om ct je am In B e ai n M Tw o B ea m M od ul or es s 0. 00% Average Weight Geometric- DEMATEL AVERAGE 18
Availability allocation by complexity criteria Allowed maximun downtime requirements by system in 221 days of operation to reach 80% of total availability *Data from CLIC CDR vol. 3 Total days in production=177 Production=221 days Fault induced downtime=44 days Access Safety and Controls. . . Technical Alarm system Post decelerators Machine Interlocks Technical Network Post-collision line Long Transfer Lines and TA. . . Beam Transport - RTML Drive Beam Injectors Beam delivery system Recombination Complex Damping Ring Complex Main Beam Injectors Two Beam Modules 0 2 4 6 8 10 12 14 Downtime budget [h] 06/11/2020 19
Understanding CLIC operating cycle Luminosity [t] Integrated luminosity as a function of availability Time [t] NOMINAL OP Successive subsystems in CLIC complex A B C D E Beam 06/11/2020 20
Understanding CLIC operating cycle Luminosity [t] Integrated luminosity as a function of availability Time [t] NOMINAL OP Equipment in fault / switched off A B 06/11/2020 C D E 21
Understanding CLIC operating cycle Luminosity [t] Integrated luminosity as a function of availability Time [t] NOMINAL OP Equipment in fault / switched off A B C Running equipment with (partial)beam 06/11/2020 D E Running equipment without beam 22
Understanding CLIC operating cycle Luminosity [t] Integrated luminosity as a function of availability NOMINAL OP Time [t] FAULT TIME Equipment repaired A B C Running equipment with (partial) beam 06/11/2020 D E Running equipment without beam 23
Understanding CLIC operating cycle Integrated luminosity as a function of availability Time needed to deliver beam to the following system. Time needed to run/switch on before the beam arrives Time from degradation until it can take beam (depends on the fault time) B Luminosity [t] C D E NOMINAL OP FAULT TIME A RECOVERY TIME B Time [t] C 06/11/2020 D E 24
Understanding CLIC operating cycle Luminosity [t] Integrated luminosity as a function of availability NOMINAL OP A FAULT TIME RECOVERY TIME B C LUMI OPTIMIZATION D NOMINAL OP Time [t] E • Luminosity optimization: Time to reach nominal operating conditions/ luminosity after the first collisions. • Nominal operation / Luminosity production 06/11/2020 25
CLIC failure scenarios Failure effects relations MB source MB injector MB PDR MB DR MD Booster Linac MB RTML Main Linac BDS Collision/Inter action Point Dump DB source DB linac DB CBR DB transport DB Post decelerators MB source out Standby no beam Standby no beam Standby no beam Standby with beam Standby with reduced beam MB injector Standby with beam Out Standby no beam Standby no beam Standby with beam Standby with reduced beam MB PDR Standby with beam out Standby no beam Standby no beam Standby with beam Standby with reduced beam MB DR Standby with beam out Standby no beam Standby no beam Standby with beam Standby with reduced beam MD Booster Linac Standby with beam out Standby no beam Standby no beam Standby with beam Standby with reduced beam MB RTML Standby with beam Standby with beam out Standby no beam Standby with beam Standby with reduced beam Linac Standby with beam Standby with beam out Standby no beam Standby with beam Standby with reduced beam BDS Standby with beam Standby with beam out Standby no beam Standby with beam Standby with beam Collision Standby with beam Standby with beam out Standby no beam Standby with beam Standby with beam Dump Standby with beam Standby with beam Standby with beam Out Standby with beam Standby with beam DB source Standby with beam Standby with beam Standby no beam out Standby no beam DB linac Standby with beam Standby with beam Standby no beam Standby with beam out Standby no beam DB CBR Standby with beam Standby with beam Standby no beam Standby with beam out Standby no beam DB transport Standby with beam Standby with beam Standby no beam Standby with beam out Standby no beam DB decelerators Standby with beam Standby with beam Standby with Standby partial beam with beam Standby no beam Standby with beam Partial Out Failure Effect 26
Summary & Outlook 06/11/2020 27
Availability models Summary & Outlook Main Beam RF Powering System • • • Analyse impact of failures repairable at the moment of occurrence on the results * -> Expected higher availability Review failure data Estimate the number of spares needed to survive until certain point in time Sensitivity analysis of failure rates Further extension of models Infrastructure and technical services • • • Organize meetings with system experts to gather data on the failure modes Run first simulations Gradually increase complexity of the model by adding access-times by location, dependencies among systems, etc. RTML and Damping Rings • • Organize meetings with magnet experts to see witch type of magnets to be used (based on magnet strength) and then with powerconverter experts for powering schemes. Define how we can determine redundancies in BPMs and correctors. Failure modes analysis Include other components 06/11/2020 28
CLIC Availability allocation by complexity criteria Summary & Outlook Exercise done at high level, intuitive results • Next step: Allocation at lower level • Complexity assessment by more than one expert • Luminosity production model Summary & Outlook • • • Agreement on the phases definition and failure scenarios Estimate recovery and tuning times Monte-Carlo Model implementation 06/11/2020 29
Thank you! Special thanks to: A. Apollonio, S. Doebert, M. Jonker, A. Latina, G. Mcmonagle, M. Motyka, D. Schulte and S. Stapnes
CLIC failure scenarios Operational Impact of faults, tuning and recover Recovery times by equipment state Failure scenario Beam off time / Repair time Consequence in Luminosity yes no Minimal loss yes short Short loss Partial beam short (~ 30 min) Partial beam Beam kept? Example Standby with Standby no partial with reduced beam Not affected Standby with beam RF Breakdown x Spurious machine protection interlocks x No production Equipment breakdown and swap with hot spare x x long (< 4 h) No production Equipment breakdown requiring expert to come to change hardware (outside the accelerator housing) x x Partial beam short (< ? ? ) No production x x x No beam long ( >> ? ? ) No production x Out Short trips Repair without access to the accelerator housing Repair with access to the accelerator housing Description Recovery = Time needed to… Not affected Standby with beam Machine performance not Running equipment with affected beam None or Minor. Luminosity Optimization send beam to the following or sytem Machine Validation Standby with reduced beam Standby partial with beam Running equipment with/ partial beam and without beam send beam to the following sytem Standby no beam Out Running equipment without beam Faulty / Off equipment recover from degradation run before beam arrives + + send beam to the send beam to following system 31
CLIC failure scenarios I Operational Impact of faults, tuning and recover ØShort trips without beam interruption -> Minimal Luminosity loss Example: RF Breakdown Expected rate: every 100 pulses, i. e 100 x 20 ms = 2 seconds. Recovery: None, occasionally a minor Luminosity Optimization could be needed. ØShort trips with short beam interruption -> Short Luminosity loss Example: Spurious machine protection interlocks, possibly due to glitches in the BLM, possibly caused by some of the RF breakdown Expected rate: every ~ 1002 pulses, 5 minutes. Recovery: Short (2 second? ) Machine Validation with luminosity interruption, but not affecting machine performance. 32
CLIC failure scenarios II Operational Impact of faults, tuning and recover ØBeam-off for repair without access to the accelerator housing • If the beam off time is short (Fault time ~30 min) -> equipment running with (partial) beam -> recovery will only require short tuning -> Recovery ≈ 0. 5 h (re-steering the golden orbits with some final IP tuning, machine validation) Example: Equipment breakdown and swap with hot spare (either remote controlled in the accelerator housing or on the surface by operator on duty. ) • If the beam off time is long enough ( 30 min < Fault time < 4 h) o Running equipment with (partial) beam: Unaffected systems -> recovery will only require short tuning -> Recovery = Time needed to deliver beam to the following system o Running equipment without beam: Affected systems -> equipment performance will be degraded-> Recovery = time from degradation until it can take beam (depends on the fault time) + time needed to deliver beam to the following system Example: Equipment breakdown requiring expert to come to change hardware (outside the accelerator housing). 33
CLIC failure scenarios III Operational Impact of faults, tuning and recover ØBeam-off for a repair with access to the accelerator housing (Fault time >4 h) • If Fault time short < ? ? h -> Partial beam kept o Running equipment with (partial) beam: Unaffected systems -> recovery will only require short tuning -> Recovery = Time needed to deliver beam to the following system o Running equipment without beam (Affected systems) -> equipment performance will be degraded-> Recovery = time to recover from degradation until it can take beam (depends on the fault time) + time needed to deliver beam to the following system o Faulty or off systems: Equipment switched off without beam -> will be switched on when fault cleared-> Recovery= Time needed to run/switch on before the beam arrives + time needed to deliver beam to the following system • If Fault time long>> ? ? h -> Beam is switched of to safe power comsumption o Faulty or off systems: Equipment switched off without beam -> will be switched on when fault cleared-> Recovery= Time needed to run/switch on before the beam arrives + time needed to deliver beam to the following system 34
- Slides: 34