Alarm monitoring activities Alarm systems Tier 1 INFN

  • Slides: 5
Download presentation
Alarm & monitoring activities Alarm systems @ Tier 1 INFN CNAF, 22 -12 -2004

Alarm & monitoring activities Alarm systems @ Tier 1 INFN CNAF, 22 -12 -2004

Alarms & Monitoring • Alarms management – D. Degirolamo – Control of main services

Alarms & Monitoring • Alarms management – D. Degirolamo – Control of main services and servers • Monitoring – F. Rosso – Collection of data from servers and WNs – “Slight” overlap with alarm system and other monitoring tools (i. e. Gridice)

Alarm system: status 2 indipendent servers nagios. cnaf. infn. it • LAN • Internet

Alarm system: status 2 indipendent servers nagios. cnaf. infn. it • LAN • Internet connectivity • INFN Services (dns, afslib, web) • Management servers (bastion, ldap/krb) • Farms (PBS) • BABAR farm castor-1. cnaf. infn. it • Castor: server, stager & tapeserver • Disk server and fast. T 900

Alarm system: developments Distributed alarm system • Several servers to actively control different services

Alarm system: developments Distributed alarm system • Several servers to actively control different services • A central collector server – Alarm management – e-mail / SMS notification • SMS tested, but not yet deployed – reporting/logging • Implementation: 1 month fulltime

Monitoring system • At present ~ 300 systems monitored – package developed at CNAF

Monitoring system • At present ~ 300 systems monitored – package developed at CNAF • Needed to have a tool to collect a plethora of data from systems • Implemented to be efficient in collecting data – ~ 100 variables collected from each WN – ~ 140 variables from servers – Daily, weekly, monthly & yearly plots • Need standard interface to exchange data with other monitoring tools (e. g. Gridice) to avoid multiple collection of same data