CERN IT GT DMS DPM Nagios Puppet Making

















- Slides: 17
CERN / IT / GT / DMS DPM Nagios & Puppet Making life easy(er) Ricardo Rocha (ricardo. rocha@cern. ch) On behalf of the IT/GT/DMS team
Monitoring with NAGIOS Date CERN IT / GT / DMS 2
Nagios in DPM • Why? – Widely used in WLCG / EMI – Community support – Integration with other tools • How? – Support for Nagios v 2 or v 3, pnp 4 nagios 0. 4. x – One additional rpm – Multiple configuration options: manual, NCG, Puppet https: //svnweb. cern. ch/trac/lcgdm/wiki/Dpm/Admin/Monitoring CERN IT / GT / DMS 3
Available Probes • Initial wishlist – Discussed with some of the site admins – Some were generic, already available • Mostly in Python • Probes – Compliant with nagios guidelines (status, output) – Database activity, filesystem activity, free and used disk space, service status, … – List is constantly growing CERN IT / GT / DMS 4
Available Probes • check_hostcert • check_oracle_expiration • check_partition_activity • check_dpm_pool – unresponsive disks, filesystems, space usage • check_dpns • check_gridftp – read/write ops • check_cpu – get and put – no i/o wait (coming soon) • check_rfio • check_network – get and put – multiple interfaces • check_process – memory, cpu, number descriptors, number instances and threads CERN IT / GT / DMS 5
Available Probes CERN IT / GT / DMS 6
Performance Visualization • Enabled for all probes – Status for some, detailed performance for many • pnp 4 nagios (http: //www. pnp 4 nagios. org/) – There are other tools doing the same, this is the one used today in WLCG / EMI – Specific templates provided with our rpm http: //puppet. cern. ch/nagios/pnp 4 nagios/index. php CERN IT / GT / DMS 7
Performance Visualization CERN IT / GT / DMS 8
Future Work • Release and rollout to sites – And collect feedback, suggestions for new probes… – RPM available, not yet in any official repository – We’re happy to fulfill your wishes • Current requests – VO specific usage information (space token might be enough and is much easier) – SRM success / fail rate • Other coming probes – Lots of real time performance data • Avg access time per operation, avg transfer time, etc • Extending to other DMS components – LFC, FTS CERN IT / GT / DMS 9
Configuration with Puppet CERN IT / GT / DMS 10
Puppet in DPM • Why? – Looked for a strong configuration management solution • For our internal testbed(s), but reusable at DPM sites – Strong momentum, community and commercial support • twitter, sun / oracle, rackspace, … – (very) Extensive documentation • How? – – Set of specific modules + generic ones Available via puppet forge or github An alternative… not a replacement for YAIM Current support level is « best effort » https: //svnweb. cern. ch/trac/lcgdm/wiki/Dpm/Admin/Puppet CERN IT / GT / DMS 11
Reusable Modules • All DPM required areas are covered • As generic and reusable as possible – Go ahead and try them for your service CERN DPM DMS VOMS LCGUTIL GLITE MYSQL … https: //forge. puppetlabs. com/users/rocha CERN IT / GT / DMS 12
Cluster Configuration HEAD NODE DISK NODES CLIENT NODES CERN IT / GT / DMS 13
Cluster Configuration GENERIC NODE HEAD NODE DISK NODES CLIENT NODES CERN IT / GT / DMS 14
Puppet Dashboard CERN IT / GT / DMS 15
Puppet & Nagios Integration • The « cherry on top » • Puppet has built-in resource types for nagios CERN IT / GT / DMS 16
Future Work • Make it available to interested DPM sites • Improve DPM modules from their experience – Larger setups, bigger problems • Get other product teams to reuse our common modules – And to help us with their maintenance CERN IT / GT / DMS 17