A Antoine LBDS TSU ASI failure report 2016
A. Antoine LBDS TSU & AS-I failure report 2016) 27 September 2016 (Sept. LBDS: TSU & AS-i Status 2
Content • TSU • • AS-I • • • Operation History Failure Impact Failure Analysis Specifications & Framework LBDS Configuration Operation History Failure Impact Failure Analysis Conclusion 27 September 2016 LBDS: TSU & AS-i Status 3
TSU 27 September 2016 LBDS: TSU & AS-i Status 4
TSU Operation History • • Version 1 - prototype never been in operation Version 2 - in operation from LHC start up to LS 1 • • • First operational experience No critical hardware failure Poor diagnosis capability SPS compatibility required (new request) Potential major failure detected (internal review) Version 3 – in operation from LS 1 • • • Critical hardware failure on 1 st July 2016 Synchronous dump done LBDS B 1 – TSU-B replaced 27 September 2016 LBDS: TSU & AS-i Status 5
TSU Failure Impact LBDS worst case failure ! • Thanks to redundancy fail-safe design: • • • Operation: • • Synchronous dump done Expert investigation needed MTTR: ~ 1 hour 5 hours of downtime (LHC access required !) Cost: • • Materials: ~ 2500 CHF / intervention Expert & On call service: ~ 500 CHF 27 September 2016 LBDS: TSU & AS-i Status 6
TSU Failure Analysis (1 st July) FPGA fatal error (not recoverable) • Power supplies suspected • 3 dependent + 2 independent power supplies on a TSU board: • • • +1. 2 V -> FPGA core +1. 8 V -> EEPROM (Flash Rom for FPGA) +2. 5 V -> FPGA & CPLD +3. 3 V -> most of components, FPGA interface included +5 V -> CIBO powering 27 September 2016 LBDS: TSU & AS-i Status 7
TSU Failure Analysis: abnormal startup ~ +3 V +1. 2 V +1. 8 V ~ +1. 8 V +3. 3 V +2. 5 V 27 September 2016 LBDS: TSU & AS-i Status 8
TSU Failure Analyse: normal startup (FPGA removed) +1. 2 V +1. 8 V +2. 5 V +3. 3 V 27 September 2016 LBDS: TSU & AS-i Status 9
TSU Failure Diagnosis An internal FPGA failure induce a short circuit on the +1. 2 V power supply • Design review with N. Magnin: • • • +1. 2 V power supply very noisy Noise with transients above FPGA specifications Some decoupling capacitors missing on the +5 V power supply used to generate the +1. 2 V Still not clear why FPGA create a short circuit ! 27 September 2016 LBDS: TSU & AS-i Status 10
TSU Failure Diagnosis: Power Supplies Noise ~250 m. V +1. 2 V +1. 8 V +2. 5 V +3. 3 V 27 September 2016 LBDS: TSU & AS-i Status 11
TSU Failure Diagnosis: Power Supplies Noise + 5 V from VME is the source of all power supplies … +5 V 27 September 2016 LBDS: TSU & AS-i Status 12
Conclusion (TSU) 1 critical failure in 10 years of operation • MTTR of 5 Hours • Redundant TSU strategy worked fine: • • Detection of the failure Synchronous Dump done Corrective action to be validated and deployed to remove noise on the +5 V and 1. 2 V power supply 27 September 2016 LBDS: TSU & AS-i Status 13
AS-i 27 September 2016 LBDS: TSU & AS-i Status 14
AS-i Acuator-Sensor Interface • Specifications: • • CEI 62026 -2 and EN 50295 Standards Data on power line (decoupling filter) 8 bits data serial bus with Safety capability (SIL 3) Up to 62 standard nodes or 31 safety nodes Reaction time <10 ms Up to 100 m length (300 m with repeater) Framework: • • • 1 x AS-I master controller 1 x dedicated power supply Unshielded 2 -wires cable wrapped with an electrical insulator for data and power Actuators & Sensors Safety monitor (when needed) 27 September 2016 LBDS: TSU & AS-i Status 15
AS-i LBDS Configuration 27 September 2016 LBDS: TSU & AS-i Status 16
AS-i Operation history • • 2 hardware failures in 10 years of operation Same failure signature … but one was the AS-i F Link module All 4 systems impacted (beam 1 & 2) First occurrence shortly before LS 1 (6 years of operation) • • • Curative maintenance (on call service) Early LS 1, preventive maintenance done with replacement of AS-I F Link & Power supply components. Second occurrence some weeks ago on 3 systems • • Curative maintenance (on call service) Preventive maintenance during TS 3 2016 done with replacement of all AS-I Power supplies. 27 September 2016 LBDS: TSU & AS-i Status 17
AS-i Failure Impact LBDS abruptly stopped (as an AUE) AS-I worst case failure (Power and discharging switches switched off) • Synchronous dump (thanks to fail-safe design) • Operation: • • • Short MTTR: 45 min 4 h of downtime / intervention (access to the LHC needed !) Cost: • • Materials: ~ 1000 CHF / intervention On call service: ~ 300 CHF 27 September 2016 LBDS: TSU & AS-i Status 18
AS-i Failure Diagnosis • 2 components identified as potential responsible of the AS-I failure: • • • Master controller: • • • AS-I Master controller (AS-I F Link) AS-I Power supply Controller down and not resettable ! No software diagnosis available Power Supply: • • Output filter showed degradation (capacitors) Out of specification connection of the AS-I bus (spring terminal -> no pod on wire allowed !) 27 September 2016 LBDS: TSU & AS-i Status 19
AS-i Failure Diagnosis • Scenario 1: • • Data on the AS-I bus are altered by the degradation of the capacitor of the power supply output filter The AS-I Master controller get wrong reply messages from safety sensors (Data corruption) The AS-I Master controller goes to safe state with failure (not resettable) Scenario 2: • • • Bad connections (use of pod on spring teminals) Data corruption The AS-I Master controller goes to safe state with failure (not resettable) 27 September 2016 LBDS: TSU & AS-i Status 20
AS-i Corrective action during TS 3 Done on all systems (4 x) • New AS-I Power supply • Remove all pods on wires connected with spring terminals • 27 September 2016 LBDS: TSU & AS-i Status 21
Conclusion (AS-i) 2 periods of failures in 10 years • MTTR short but MTBF increase after one occurrence (burst behavior) • Fail-safe design: Synchronous Dump done • Corrective action during TS 3: • • • Replacement of all AS-I Power supply Remove wire pods on spring terminals 27 September 2016 LBDS: TSU & AS-i Status 22
- Slides: 23