The LHCb Run Control An Integrated and Homogeneous
The LHCb Run Control An Integrated and Homogeneous Control System Clara Gaspar, March 2009
The Experiment Control System ❚ Is in charge of the Control and Monitoring of all parts of the experiment Experiment Control System DCS Devices (HV, LV, GAS, Cooling, etc. ) Detector Channels L 0 TFC Front End Electronics Readout Network HLT Farm Storage Monitoring Farm DAQ External Systems (LHC, Technical Services, Safety, etc) Clara Gaspar, March 2009 2
Some Requirements ❚ Large number of devices/IO channels ➨ Need for Distributed Hierarchical Control ❘ De-composition in Systems, sub-systems, … , Devices ❘ Local decision capabilities in sub-systems ❚ Large number of independent teams and very different operation modes ➨ Need for Partitioning Capabilities (concurrent usage) ❚ High Complexity & Non-expert Operators ➨ Need for Full Automation of: ❘ Standard Procedures ❘ Error Recovery Procedures ➨ And for Intuitive User Interfaces Clara Gaspar, March 2009 3
Design Steps ❚ In order to achieve an integrated System: ❙ Promoted HW Standardization (so that common components could be re-used) ❘ Ex. : Mainly two control interfaces to all LHCb electronics 〡Credit Card sized PCs (CCPC) for non-radiation zones 〡A serial protocol (SPECS) for electronics in radiation areas ❙ Defined an Architecture ❘ That could fit all areas and all aspects of the monitoring and control of the full experiment ❙ Provided a Framework ❘ An integrated collection of guidelines, tools and components that allowed the development of each sub-system coherently in view of its integration in the complete system Clara Gaspar, March 2009 4
Generic SW Architecture ECS Status & Alarms Commands INFR. Sub. Det 1 LV TFC DCS Sub. Det 1 DCS Sub. Det 2 DCS Sub. Det 1 TEMP Sub. Det 1 GAS … Sub. Det. N DCS DAQ Sub. Det 1 FEE Sub. Det 2 DAQ Sub. Det 1 RO HLT … LHC Sub. Det. N DAQ Legend: Control Unit LV Dev 1 LV Dev 2 … LV Dev. N FEE Dev 1 FEE Dev 2 … FEE Dev. N Clara Gaspar, March 2009 Device Unit 5
The Control Framework ❚ The JCOP* Framework is based on: ❙ SCADA System - PVSSII for: Device Units Control Units ❘ Device Description (Run-time Database) ❘ Device Access (OPC, Profibus, drivers) ❘ Alarm Handling (Generation, Filtering, Masking, etc) ❘ Archiving, Logging, Scripting, Trending ❘ User Interface Builder ❘ Alarm Display, Access Control, etc. ❙ SMI++ providing: ❘ Abstract behavior modeling (Finite State Machines) ❘ Automation & Error Recovery (Rule based system) * – The Joint COntrols Project (between the 4 LHC exp. and the CERN Control Group) Clara Gaspar, March 2009 6
Device Units Device Unit ❚ Provide access to “real” devices: ❙ The Framework provides (among others): ❘ “Plug and play” modules for commonly used equipment. For example: 〡CAEN or Wiener power supplies (via OPC) 〡LHCb CCPC and SPECS based electronics (via DIM) ❘ A protocol (DIM) for interfacing “home made” devices. For example: 〡Hardware devices like a calibration source 〡Software devices like the Trigger processes (based on LHCb’s offline framework – GAUDI) ❘ Each device is modeled as a Finite State Machine Clara Gaspar, March 2009 7
Hierarchical control Control Unit ❚ Each Control Unit: ❙ Is defined as one or more Finite State Machines ❙ Can implement rules based on its children’s states ❙ In general it is able to: ❘ Summarize information (for the above levels) ❘ “Expand” actions (to the lower levels) ❘ Implement specific behaviour & Take local decisions 〡Sequence & Automate operations 〡Recover errors ❘ Include/Exclude children (i. e. partitioning) DCS Tracker DCS Muon LV … Muon GAS 〡Excluded nodes can run is stand-alone ❘ User Interfacing 〡Present information and receive commands Clara Gaspar, March 2009 8
Control Unit Run-Time ❚ Dynamically generated operation panels (Uniform look and feel) ❚ Configurable User Panels and Logos ❚ “Embedded” standard partitioning rules: ❙ Take ❙ Include ❙ Exclude ❙ Etc. Clara Gaspar, March 2009 9
Operation Domains ❚ Three Domains have been defined: ❙ DCS ❘ For equipment which operation and stability is normally related to a complete running period Example: GAS, Cooling, Low Voltages, etc. ❙ HV ❘ For equipment which operation is normally related to the Machine state. Example: High Voltages ❙ DAQ ❘ For equipment which operation is related to a RUN Example: Readout electronics, High Level Trigger processes, etc. Clara Gaspar, March 2009 10
FSM Templates ❚ DCS Domain Recover ERROR ❚ HV Domain Switch_OFF NOT_READY ERROR Recover OFF Switch_ON NOT_READY Go_STANDBY 1 RAMPING_STANDBY 1 Switch_OFF STANDBY 1 READY Go_STANDBY 2 RAMPING_STANDBY 2 Go_READY RAMPING_READY ❚ DAQ Domain ERROR READY Recover UNKNOWN NOT_READY Configure CONFIGURING Reset READY Start Stop ❚ All Devices and Sub. Systems have been implemented using one of these templates RUNNING Clara Gaspar, March 2009 11
ECS: Run Control ❚ Size of the Control Tree: ECS ❙ Distributed over ~150 PCs ❘ ~100 Linux (50 for the HLT) ❘ ~ 50 Windows HV ❙ >2000 Control Units ❙ >30000 Device Units DCS TFC X Sub. Det 1 DCS … Sub. Det. N DCS DAQ X Sub. Det 1 DAQ … HLT LHC Sub. Det. N DAQ Sub. Det 1 ❚ The Run Control can be seen as: ❙ The Root node of the tree ➨ If the tree is partitioned there can be several Run Controls. Clara Gaspar, March 2009 12
Run Control ❚ Matrix Domain X Sub-Detector ❚ Activity Used for Configuring all Sub-Systems Clara Gaspar, March 2009 13
Partitioning ❚ Creating a Partition ❚ ECS Domain NOT_ALLOCATED Allocate ALLOCATING Deallocate Recover NOT_READY ERROR Configure ❙ Allocate = Get a “slice” of: ❘ Timing & Fast Control (TFC) ❘ High Level Trigger Farm (HLT) ❘ Storage System ❘ Monitoring Farm CONFIGURING Reset READY Start. Run Stop. Run ACTIVE Start. Trigger Stop. Trigger RUNNING Clara Gaspar, March 2009 14
Sub-Detector Run Control ❚ “Scan” Run Clara Gaspar, March 2009 15
Conclusions ❚ LHCb has designed and implemented a coherent and homogeneous control system ❚ The Run Control allows to: ❙ Configure, Monitor and Operate the Full Experiment ❙ Run any combination of sub-detectors in parallel in standalone ❙ Can be completely automated (when we understand the machine) ❚ Some of its main features: ❙ Sequencing, Automation, Error recovery, Partitioning ➨ Come from the usage of SMI++ (integrated with PVSS) (Poster no: 540, Board no: Thursday 103) ❚ It’s being used daily for all sub-detector tests and global activities Clara Gaspar, March 2009 16
- Slides: 16