ACET Accelerator Controls Exploitation Tools Progress and plans

  • Slides: 44
Download presentation
ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012

ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 2 ACET -

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 2 ACET - TC on 06 December 2012

Controls system overview Services “Core” Applications DB 425 Consoles 400 GUIs Boot Tune Middletier

Controls system overview Services “Core” Applications DB 425 Consoles 400 GUIs Boot Tune Middletier In. CA/LSA Proxies Front Ends cmw. Admin JMS 1300 FECs 600 module types 85. 000 devices CMW/FESA RT Timing Drivers Hardware 3 Dia. Mon FESA Navigator SIS cmw. Dir RBAC Orbit Sequencer 300 Servers 200 Java servers NFS Knobs Diagnostics ACET - TC on 06 December 2012 Video Syslog

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 4 ACET -

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 4 ACET - TC on 06 December 2012

ACET Motivation Distributed and complex controls system Knowledge distributed over many experts Move towards

ACET Motivation Distributed and complex controls system Knowledge distributed over many experts Move towards uniform (LHC) exploitation model across machines Purpose: Allow (non-)experts to carry out more efficient diagnostics ACET collaborates with CO projects to improve diagnostic facilities of the control system 5 ACET - TC on 06 December 2012

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 6 ACET -

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 6 ACET - TC on 06 December 2012

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX Dia. Mon – GUI and CLIC agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB 7 ACET - TC on 06 December 2012

Diagnostic tools Tools evaluated for criticality Aggregation into CCM diagnostic menu Training given during

Diagnostic tools Tools evaluated for criticality Aggregation into CCM diagnostic menu Training given during shutdown lectures 8 ACET - TC on 06 December 2012

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX Dia. Mon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB 9 ACET - TC on 06 December 2012

Process Metrics – JMX architecture http: //wikis/display/ACET/JMX+client+instrumentation Dia. Mon GUI j. Console JMX viewer

Process Metrics – JMX architecture http: //wikis/display/ACET/JMX+client+instrumentation Dia. Mon GUI j. Console JMX viewer C 2 Mon JMX-DAQ j. Visual. VM Metrics SRV jmx-dir-client mgt Jmx. Directory 10 ACET - TC on 06 December 2012 RMI JMX m. Beans JVM jar 2 jar 1 SRV

Process metrics – CMX architecture http: //wikis/display/MW/CMX Dia. Mon GUI C 2 Mon CMX

Process metrics – CMX architecture http: //wikis/display/MW/CMX Dia. Mon GUI C 2 Mon CMX viewer DB CLIC-DAQ Metrics FEC CLIC agent cmx-lib-c++ Command line tool cmx-lib registry cmx-lib-c lib 1 lib 2 C process p 1 11 ACET - TC on 06 December 2012 lib 1 lib 2 p 1 shared memory segments cmx-lib-c++ lib 3 lib 4 p 2 lib 3 lib 4 C++ process p 2

Process metrics – Dia. Mon JMX integration 12 ACET - TC on 06 December

Process metrics – Dia. Mon JMX integration 12 ACET - TC on 06 December 2012

Process metrics - j. Console 13 ACET - TC on 06 December 2012

Process metrics - j. Console 13 ACET - TC on 06 December 2012

Process metrics - Viewers 14 ACET - TC on 06 December 2012

Process metrics - Viewers 14 ACET - TC on 06 December 2012

Process metrics – JMX lookup 15 ACET - TC on 06 December 2012

Process metrics – JMX lookup 15 ACET - TC on 06 December 2012

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX Dia. Mon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB 16 ACET - TC on 06 December 2012

Documentation - Structure 17 ACET - TC on 06 December 2012

Documentation - Structure 17 ACET - TC on 06 December 2012

Documentation – Portal 18 ACET - TC on 06 December 2012

Documentation – Portal 18 ACET - TC on 06 December 2012

Documentation – Useful links 19 ACET - TC on 06 December 2012

Documentation – Useful links 19 ACET - TC on 06 December 2012

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX Dia. Mon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB 20 ACET - TC on 06 December 2012

Dependencies - architecture Data collection before LS 1 Dependency analysis Visualization “dot” files log

Dependencies - architecture Data collection before LS 1 Dependency analysis Visualization “dot” files log files cmwadmin-scanner cmw. Directory client connections FEC FEC FEC cmw. Admin CMW/FESA 21 ACET - TC on 06 December 2012 http: //wikis/display/MW/Statistics

Dependencies – a view 22 ACET - TC on 06 December 2012

Dependencies – a view 22 ACET - TC on 06 December 2012

Face Fec. Book Dependencies – a view 23 ACET - TC on 06 December

Face Fec. Book Dependencies – a view 23 ACET - TC on 06 December 2012 http: //wikis/display/MW/Statistics

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX

Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX Dia. Mon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB 24 ACET - TC on 06 December 2012

Feedback – architecture http: //wikis/display/MW/Log+and+Tracing Splunk APEX GUIs /var/log/messages Listeners GUIs CCDB syslog@cs-ccr-tracing JMS@cs-ccr-cmw

Feedback – architecture http: //wikis/display/MW/Log+and+Tracing Splunk APEX GUIs /var/log/messages Listeners GUIs CCDB syslog@cs-ccr-tracing JMS@cs-ccr-cmw JMS@cs-ccr-tracing converters syslog@cs-ccr-feop Syslog tracing Java tracing Tracing & Config logfiles syslog libs C process 25 Scripts wreboot make cmmnbld deploy ACET - TC on 06 December 2012 cmw-fb-c libs C process cmw-log cmw FESA 3 Impl cmw-log 4 j jar 1 jar 2 Java process FEC/SRV

Feedback – CCDB tracing GUI 26 ACET - TC on 06 December 2012

Feedback – CCDB tracing GUI 26 ACET - TC on 06 December 2012

Feedback – Hardware config CCDB GUI 27 ACET - TC on 06 December 2012

Feedback – Hardware config CCDB GUI 27 ACET - TC on 06 December 2012

Splunk - architecture Central instance running on dedicated machine Project accounts set up Training

Splunk - architecture Central instance running on dedicated machine Project accounts set up Training given to projects Contact Steen for Splunk access Project-specific searches created Splunk@cs-ccr-tracing filters /var/log/messages syslog@cs-ccr-tracing filter&throttle logfiles syslog@cs-ccr-feop JMS@cs-ccr-cmw JMS@cs-ccr-tracing cmw-log FEC 28 FEC ACET - TC on 06 December 2012 FEC FEC logfiles FEC SRV cmw-log 4 j SRV

Splunk – Message filter GUI 29 ACET - TC on 06 December 2012

Splunk – Message filter GUI 29 ACET - TC on 06 December 2012

Splunk – saved searches 30 ACET - TC on 06 December 2012

Splunk – saved searches 30 ACET - TC on 06 December 2012

Splunk - visualization 31 ACET - TC on 06 December 2012

Splunk - visualization 31 ACET - TC on 06 December 2012

Splunk – dashboard 32 ACET - TC on 06 December 2012

Splunk – dashboard 32 ACET - TC on 06 December 2012

Splunk – Use case: japc-ext-dir Queue overflow messages from CMW proxy Hosts and PIDs

Splunk – Use case: japc-ext-dir Queue overflow messages from CMW proxy Hosts and PIDs reported Client application identified japc-ext-dir suspected – and verified Subscriptions made to “constant” properties Data never consumed => Queue overflow in proxy Problem fixed by Eric 33 ACET - TC on 06 December 2012

Splunk – Use cases Leap second RBAC tokens missing/malformed/expired CMW slow clients Telegram layout

Splunk – Use cases Leap second RBAC tokens missing/malformed/expired CMW slow clients Telegram layout and configuration JAPC applying wrong token in certain cases FESA handling of Timlib error Separating test environment from operational 34 ACET - TC on 06 December 2012

Splunk – Comments (1) “Proper usage requires very good configuration” “We need to rework

Splunk – Comments (1) “Proper usage requires very good configuration” “We need to rework our way to log information…” “Log files are a bit of a mess now, and only contain a sub -set of necessary data…it is necessary to clean up and extend logging…” “…it must be possible for others to access the data…” 35 ACET - TC on 06 December 2012

Splunk – Comments (2) Positive comments “Powerful tool for detecting and reporting anomalies” “Very

Splunk – Comments (2) Positive comments “Powerful tool for detecting and reporting anomalies” “Very useful for proactive actions” “Powerful tool to make statistics” “It avoids spending time creating tools for decoding traces” “It is an agile way to gather analytics, to inform design decisions” “It is a very powerful auditing tool” “Trends over time allow spotting new types of problems” “It was useful for me several times for seeing if a problem is on one or multiple machines” “It gives an easy, reusable way of looking at logfiles” “It could become a valuable tool to spot errors, where currently we feel blind whenever there is a problem” 36 ACET - TC on 06 December 2012

Splunk – vision Active, daily use by component providers - Dashboards Exploit tracing for

Splunk – vision Active, daily use by component providers - Dashboards Exploit tracing for Pro-active operation Informed evolution Preventive maintenance 10 user-friendly message types per project ERROR or WARNING Contact information Link to documentation Message body meaningful to non-expert No java stack trace Continuous improvement of messages 37 ACET - TC on 06 December 2012

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 38 ACET -

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 38 ACET - TC on 06 December 2012

Plans for 2013 (a) Dia. Mon Interactive service-oriented dependency view Declare and monitor process

Plans for 2013 (a) Dia. Mon Interactive service-oriented dependency view Declare and monitor process metrics Integrate metrics viewers Launching of external tools Make contact information accessible Splunk Improve current setup and configurations Increase support and project uptake Investigate integration of ITAT 39 ACET - TC on 06 December 2012

Plans for 2013 (b) Documentation Agree/implement CO-wide website/wiki structure Agree on maintenance responsibilities Portal

Plans for 2013 (b) Documentation Agree/implement CO-wide website/wiki structure Agree on maintenance responsibilities Portal – review, add and extend pages Content – all projects provide ½-page description Databases Finalize Hardware Configuration Feedback mechanisms Capturing version information, detecting time bombs Update contact information 40 ACET - TC on 06 December 2012

Plans for 2013 (c) Feedback (Tracing and Configuration) Improve message quality (structure, content, level)

Plans for 2013 (c) Feedback (Tracing and Configuration) Improve message quality (structure, content, level) Increase project usage of feedback API All projects review configuration/version feedback Process metrics Work with projects to expose metrics Extend CMX (commands, …) ? MW team take over jmx. Directory 41 ACET - TC on 06 December 2012

Plans for 2013 (d) Runtime dependency data Analysis and visualization of CMW data Collecting

Plans for 2013 (d) Runtime dependency data Analysis and visualization of CMW data Collecting network connection information Drivers Finalize hardware configuration feedback Version feedback implementation 42 ACET - TC on 06 December 2012

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 43 ACET -

Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions 43 ACET - TC on 06 December 2012

Conclusions Done Means for provision/transport of tracing, configuration and metrics Centralized Tracing and analysis

Conclusions Done Means for provision/transport of tracing, configuration and metrics Centralized Tracing and analysis Todo Data generation by projects Documentation Analysis and presentation Good support from projects in 2012, but… Too many other priorities for developers – and for me… 2013 is for bringing the pieces together ACET needs time from all projects in 2013 44 ACET - TC on 06 December 2012