CWG 10 Control Configuration and Monitoring Status and
- Slides: 19
CWG 10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop 2014@Pusan
Outline ▶ ▶ ▶ Motivation A brief overview of data taking operations Lessons learned from Run 1 CCM Overview Performance tests Next steps ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 2
Motivation ▶ Why do we need a Control System ? ▶ Start and stop processes ▶ Sequence of operations, synchronization ▶ External systems ▶ Automation ▶ Why do we need a Configuration System ? ▶ Configure processes ▶ Why do we need a Monitoring System ? ▶ Detect abnormal conditions ▶ Automation ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 3
Team ▶ CERN ▶ KMUTT, Thailand ▶ See next presentation by Khanasin for an update ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 4
A brief overview of data taking operations ▶ A typical LHC year Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec Shutdown for maintenance proton-proton collisions Heavy-ion collisions Disclaimer: current system, not O 2 ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 5
A brief overview of data taking operations ▶ A typical LHC Fill (up to 30 hours) Jan Feb Beam Injection Mar Apr May Jun July Aug Sep Stable beams Oct Nov Dec Beam dump • ALICE safe • Detector calibration • Prepare • Partial ALICE READY trigger configuration Ideally a single run • Full ALICE READY • Data taking • Detector calibration Disclaimer: current system, not O 2 ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 6
A brief overview of data taking operations ▶ A typical ALICE run Start-of. Run • Config detectors electronics • Start online systems • Store data taking conditions Data taking End-of. Run • Readout • Event building • Online data monitoring • Online calibration data • Export data taking conditions and calibration data to Offline • Stop online systems Disclaimer: current system, not O 2 ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 7
A brief overview of data taking operations ▶ Run 1 SOR sequence (high level) Disclaimer: current system, not O 2 ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 8
Lessons learned from Run 1 (2010 -2013) ▶ Must be fast when changing run ▶ More runs than expected ▶ Not everything needs to be restarted Run 2: Fast SOR/EOR ▶ Must be flexible Run 2: Pause ▶ Not every problem needs to stop a run and Recover ▶ Must monitor everything Run 2: MAD ▶ Data flow monitoring ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 9
Control in O 2 - Overview ▶ Process Management ▶ Start/stop processes ▶ Send commands to processes (CONFIGURE, PAUSE/RESUME, etc. ) ▶ Estimated: O(100 k) processes ▶ Task Management ▶ Ensure that actions are executed in the correct order ▶ Automation ▶ Automatically recover from errors ▶ Automatically react to internal events (e. g. need more EPNs), external events (e. g. start of LHC collisions) ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 10
Control in O 2 - Notes ▶ Includes processes from online and offline ▶ Must control both synchronous and asynchronous tasks ▶ Cannot be seen as a batch system ▶ Bound to external events (e. g. start of collisions) ▶ Sequence of operations, synchronization points ▶ Low latency very important ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 11
Configuration in O 2 - Overview ▶ Configuration distribution ▶ Provide processes with needed configuration parameters ▶ Dynamic process (re)configuration ▶ Essential to achieve fast run transition ▶ O(1 GB) of configuration data ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 12
Monitoring in O 2 - Overview ▶ Data collection and archival ▶ System monitoring (CPU, memory, I/O, etc. ) ▶ Application monitoring (data rates, link backpressure, internal buffer status, etc. ) ▶ O(600 KHz) of monitoring data ▶ Alarms and action triggering ▶ Support shift crew, experts ▶ Feedback to Control system ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 13
Monitoring in O 2 - Notes ▶ Includes metrics from online and offline ▶ Includes both low and high frequency metrics ▶ Low: every 30 seconds, system metrics ▶ High: every second, link status ▶ Permanent storage will be the limiting factor ▶ No need to store everything, can filter “interesting” values ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 14
Performance Tests: Control ▶ Tool: SMI (State Machine Interface) ▶ Setup: ▶ Level 0 SMI domain: Partition CCM ▶ Level 1 SMI domain: Detector CCMs EPN Cluster CCM ▶ Level 2 SMI domain: FLP CCMs, EPN CCMs ▶ Level 2 SMI proxy: local process ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 15
Performance Tests: Control ▶ Setup: ▶ 46 hosts ▶ 1 Level 0 domain ▶ 20 Level 1 domains ▶ 1350 Level 2 domains ▶ 67500 proxies ▶ Increase due to initial lookup in DIM DNS ▶ Conclusion: cannot use in current version ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 16
Performance Tests: Monitoring ▶ Mon. ALISA + Ap. Mon ▶ Setup: ▶ 10 sender nodes, up to 1000 threads per host (Ap. Mon) ▶ 1 Mon. ALISA service, all historical record disabled ▶ Result: 52 KHz without data loss ▶ Conclusion: could use 12+ collectors to reach 600 KHz By Costin Grigoras ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 17
Performance Tests: Monitoring ▶ Zabbix ▶ Setup: ▶ 10 sender nodes, up to 10 processes per host ▶ 1 Zabbix Server node, 200 threads, permanent storage disabled (in-memory history enabled) ▶ Result: 30 KHz without data loss ▶ Conclusion: could use 20+ collectors to reach 600 KHz By Andres Gomez Ramirez ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 18
Next steps ▶ Finalise TDR ▶ Perform more tests: ▶ Control: boost library + Zero. MQ ▶ Configuration: Zoo. Keeper ▶ Monitoring: Mon. ALISA, Zabbix with permanent storage ▶ Provide CCM systems for ALFA prototype (CWG 13) ▶ Refine design ALICE O 2 CWG 10 Control, Configuration and Monitoring | ALICE O 2 Asian Workshop 2014 19
- Itu cwg fhr
- Absolute vs relative configuration
- Chiral centers in morphine
- Electron configuration vs noble gas configuration
- Absolute and relative configuration
- Wg status monitoring service
- Wgstatus
- Configuration management version control
- Compatibility configuration coordination control
- Change control board ccb
- Project monitoring and control
- Ccm continuous controls monitoring
- Continuous control monitoring tools
- Control loop performance monitoring
- Product and process control
- What is a positive and negative control
- Control flow error
- Primary control vs secondary control
- Control volume vs control surface
- Stock control e flow control