OperationsFailure Analysis Status of EquipmentProduction Readiness Plans in

  • Slides: 14
Download presentation
Operations/Failure Analysis Status of Equipment/Production Readiness Plans in Case of Part/Systems Failure For Each

Operations/Failure Analysis Status of Equipment/Production Readiness Plans in Case of Part/Systems Failure For Each Stand Type US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 1

Inventory • The US testing group has finished a complete inventory of all components

Inventory • The US testing group has finished a complete inventory of all components of used in hybrid/module/rod testing § The inventory is available on the UCSB CMS website http: //hep. ucsb. edu/cms. html • In the process, we have identified many more potential failure modes for our stands than we previously anticipated § DAQ equipment, cables, Vienna box interlock spares, chillers, CAEN power supplies, etc. § We have contacted all the people responsible for the components and hopefully we can receive all the spares we need before production fully starts US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 2

DAQ Components • The request for spare components has been out of over 5

DAQ Components • The request for spare components has been out of over 5 months now. We still need: § § § § 2 TSC 1 FED (replacement for broken UCSB board) 3 TPO 2 CCU 25 4 VUTRI 6 PAACB 11 hybrid-to-utri boards • All requests acknowledged and accepted. No ETA given for any of the parts. US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 3

Cables • We have 48 different cable types in the systems § 29 types

Cables • We have 48 different cable types in the systems § 29 types have no spares § 8 more have inadequate number of spares • Most can cause a complete system failures • We have requested spares for all cables that we can not make ourselves § Duccio and Wim have already responded • We are making a number of the cables ourselves § Ribbon cables for the TRHX box for the Module LT stands § TPO-to-VUTRI control cables US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 4

CAEN Power Supplies • With the new crate rented from CERN, we have the

CAEN Power Supplies • With the new crate rented from CERN, we have the exact number we need § CERN has recently ordered more CAEN crates (SYS 127) and controller cards (A 128 HS and A 1303). • We have requested one set as a spare for the US • With the UCR ordered controller modules: § We will have spares of all modules at both UCSB and FNAL US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 5

Assorted Equipment Issues (I) • UCSB needs to order/manufacture a spare of all components

Assorted Equipment Issues (I) • UCSB needs to order/manufacture a spare of all components used in 4 hybrid test box • Both FNAL and UCSB have found a spare NIM and VME crate § A spare set of Le. Croy modules needs to be located for pulser logic of 4 hybrid thermal cycler • Spare computers with hybrid/module software needs to be assembled and tested for UCSB/FNAL • Spare computers with module LT/single rod/multirod/interlock software needs to assembled and tested for UCSB/FNAL US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 6

Assorted Equipment Issues (II) • Torino has agreed to supply spare Vienna box interlock

Assorted Equipment Issues (II) • Torino has agreed to supply spare Vienna box interlock equipment for each site • Vienna has agreed to supply spare Vienna box sensors (RH% and T) to both FNAL and UCSB • We have ordered a spare chiller for both the hybrid thermal cyclers and the Vienna boxes US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 7

Test Stand Failure Analysis • We have a draft of the US testing operations/failure

Test Stand Failure Analysis • We have a draft of the US testing operations/failure analysis document available at: § http: //hep. ucsb. edu/cms. html • The process was really useful; it got us to think of worse case failure scenarios and how we could operate under such conditions US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 8

4 Hybrid Thermal Cycler • Biggest strength is the large over-capacity we have in

4 Hybrid Thermal Cycler • Biggest strength is the large over-capacity we have in the group § We can test ~90 hybrids per day with expected peak rate of 45 hybrid per day assuming 400 per week from company § If global failure, move production to other sites • As long as we have a stockpile of bonded hybrids, it will not affect production. Otherwise, production slowed due to shipping times • The stands have three primary weakness: software, the Peltier element, and the NESLAB chiller § Backup computer and spare Peltier elements at each site reduces these risks § We have ordered a spare chiller which can be shipped in case of a chiller failure • UCSB obtaining spares of all other components which could be shipped overnight in case of failure US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 9

ARCS module testing • We are in the best shape with these stands •

ARCS module testing • We are in the best shape with these stands • We can test ~17 modules/stand/day with expected peak rate of production of 30. § Rate made possible by improvements of ARCS software and automation of data handling § FNAL has 4 stands, UCSB has 3 stands, UCR has 1 stand • Both sites have a complete live spare • Obtained spare cables to reduce stand down-time to minimum § Repairs from Aachen have only taken 2 -4 weeks in past • Biggest headache would be a complete failure of CMS database § No wire bonding data, sensor data, or hybrid data • Will have to find all faults during testing. May require each part to be tested twice. • Would have to check testing results against known failures after database is working. Running more than two stands should remove back-log of parts. US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 10

Module LT systems • Since expected production rate exceeds testing capacity (30 vs 15

Module LT systems • Since expected production rate exceeds testing capacity (30 vs 15 modules a day) any failure would increase fraction of sampled modules § Only way to clear backlog is weekend testing § We are exploring reduction of tests to increase capacity to 20 per day • To prevent failures, spares acquired or ordered of almost all components: § Power supplies, DAQ components, cables, chillers, interlocks, etc. • We also modified Vienna box to be more stable/long-lived § Brass plates and extender connectors • If the stand is completely non-operational, production can still continue § All production should be TOB. Only produce what can be assembled/tested on rods. Would reduce production capacity to ~20 modules a day US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 11

Single Rod System • Test systems have over-capacity § 2 -4 rod assembled a

Single Rod System • Test systems have over-capacity § 2 -4 rod assembled a day with ~8 rod test capacity/stand/day • Same issues with DAQ equipment/CAEN HV as module LT § Ordered 6 extra MUX to remove need of recabling § Cables from Duccio ordered: electrical and optical • Two pieces of equipment with no spares in foreseeable future: OEC & Delphi LV power supply § In both cases, take OEC or PS from multi-rod stand (with multi-rod loss of capacity of 7% and 12%) • If single rod stand fails completely, production can still continue a slower rate § Test rods as they are loaded into multi-rod stand. Adds 1 day per test cycle § Will have to reduce rod assembly rate to match testing rate § UCSB should switch to mostly TEC production US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 12

Multi-rod System • Most complex system with least amount of experience • Same potential

Multi-rod System • Most complex system with least amount of experience • Same potential problems as module LT or single rod systems plus: § Chiller § Interlocks § Freezer infrastructure • UR have thought about different operational scenarios for these components; will have to be revisited after accumulation of more experience • Chiller § Spares in hand of all components that company believe could likely fail § Plan for finding and removing leaks in the cooling system needed • 1 or 2 spare C 6 F 14 loads needed at both sites US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 13

Multi-rod System (2) • Interlocks § Spares of sensors, etc. that can be easily

Multi-rod System (2) • Interlocks § Spares of sensors, etc. that can be easily replaced already ordered § If interlock hardware fails: • Company has 48 hour express repair plan • Power supply interlocks would be used • Control of system by hand until repair made • If case of complete system failure, rod assembly at site can still proceed at a lower rate § All rods will have to be tested with single rod stand § Single rod stand would have to reproduce as many of the multi -rod stand’s testing until multi-rod available § Only way to remove the backlog of rods assembled would be the reduction of the testing cycle time. US CMS Silicon Meeting, Nov. 16, 2004 Failure Analysis-Anthony Affolder 14