JINR Tier1 service monitoring system Ideas and Design

  • Slides: 25
Download presentation
JINR Tier-1 service monitoring system: Ideas and Design LIT Igor Pelevanyuk, Ivan Kadochnikov @GRID

JINR Tier-1 service monitoring system: Ideas and Design LIT Igor Pelevanyuk, Ivan Kadochnikov @GRID 2016

1 Introduction Why it is important and complicated

1 Introduction Why it is important and complicated

WLCG World LHC Computing Grid Tier-0: 20% of compute capacity. Tier-1: Highly reliable. Serve

WLCG World LHC Computing Grid Tier-0: 20% of compute capacity. Tier-1: Highly reliable. Serve T 2 centers. Tier-2: 160 centers. Serve users’ tasks. Tier-3: Centers serving specific groups.

Purpose Store Process Deliver Data coming from Large Hadron Collider For To Other T

Purpose Store Process Deliver Data coming from Large Hadron Collider For To Other T 1 s T 2 s T 3 s

Services Process Store Transfer Other

Services Process Store Transfer Other

Hierarchy Services OS/Software Hardware Network Infrastructure Icons are Designed by Freepik

Hierarchy Services OS/Software Hardware Network Infrastructure Icons are Designed by Freepik

Task Deploy a system to show status of services on a single page with

Task Deploy a system to show status of services on a single page with ability to investigate reasons of problems.

2 Work done What we have already achieved

2 Work done What we have already achieved

Sources Local Parsing Fast Slow Controllable Security Parsing Security Uncontrollable Security Parsing Important data

Sources Local Parsing Fast Slow Controllable Security Parsing Security Uncontrollable Security Parsing Important data

Web-security Local network HTTP request Proxy. Agent <HTML> Monitoring host <HTML> HTTPS request

Web-security Local network HTTP request Proxy. Agent <HTML> Monitoring host <HTML> HTTPS request

SSH-security Monitoring host Monitored host ssh monitor@monitored test 1 stdout, stderr Monitored host configured

SSH-security Monitoring host Monitored host ssh monitor@monitored test 1 stdout, stderr Monitored host configured to run Proxy. Command. sh on particular ssh key Proxy. Command. sh contains list with allowed commands. Passing test 1 as parameter could lead to executing /opt/adm/qsub -q

First iteration Visualization Collection Executors JSONs Retrievers HTTP Requests HTML response HTTP Request

First iteration Visualization Collection Executors JSONs Retrievers HTTP Requests HTML response HTTP Request

Dashboard

Dashboard

Transfer Service - Phedex

Transfer Service - Phedex

Phedex - Quality

Phedex - Quality

Happy. Face Simple Aggregation Alarms Detalisation

Happy. Face Simple Aggregation Alarms Detalisation

Task 2. 0 Deploy a system to show status of services on different scales

Task 2. 0 Deploy a system to show status of services on different scales with ability to react automatically on occurring events, and alow forecast based on past data.

3 Next Steps To the Future

3 Next Steps To the Future

Module REST Collect HTML, CSS, JS Analyze Architecture DB Forecast React

Module REST Collect HTML, CSS, JS Analyze Architecture DB Forecast React

Java. Script

Java. Script

Want big impact?

Want big impact?

New Techs https: //github. com/tier-one-monitoring Landscape. io – Coding style check Travis CI –

New Techs https: //github. com/tier-one-monitoring Landscape. io – Coding style check Travis CI – Continuous Integration Installation, Regressive, Compatibility, Functionality, … Covetalls. io – Code coverage

4 Concusion Wrap up!

4 Concusion Wrap up!

Conclusion Alarm system required Model of service interaction Complex Event Processing techniques More, more,

Conclusion Alarm system required Model of service interaction Complex Event Processing techniques More, more, and even more modules

THANKS! Any questions? You can write me to Pelevanyuk (at) jinr. ru

THANKS! Any questions? You can write me to Pelevanyuk (at) jinr. ru