CERN IT GT DMS DPM Nagios Puppet Making

  • Slides: 17
Download presentation
CERN / IT / GT / DMS DPM Nagios & Puppet Making life easy(er)

CERN / IT / GT / DMS DPM Nagios & Puppet Making life easy(er) Ricardo Rocha (ricardo. rocha@cern. ch) On behalf of the IT/GT/DMS team

Monitoring with NAGIOS Date CERN IT / GT / DMS 2

Monitoring with NAGIOS Date CERN IT / GT / DMS 2

Nagios in DPM • Why? – Widely used in WLCG / EMI – Community

Nagios in DPM • Why? – Widely used in WLCG / EMI – Community support – Integration with other tools • How? – Support for Nagios v 2 or v 3, pnp 4 nagios 0. 4. x – One additional rpm – Multiple configuration options: manual, NCG, Puppet https: //svnweb. cern. ch/trac/lcgdm/wiki/Dpm/Admin/Monitoring CERN IT / GT / DMS 3

Available Probes • Initial wishlist – Discussed with some of the site admins –

Available Probes • Initial wishlist – Discussed with some of the site admins – Some were generic, already available • Mostly in Python • Probes – Compliant with nagios guidelines (status, output) – Database activity, filesystem activity, free and used disk space, service status, … – List is constantly growing CERN IT / GT / DMS 4

Available Probes • check_hostcert • check_oracle_expiration • check_partition_activity • check_dpm_pool – unresponsive disks, filesystems,

Available Probes • check_hostcert • check_oracle_expiration • check_partition_activity • check_dpm_pool – unresponsive disks, filesystems, space usage • check_dpns • check_gridftp – read/write ops • check_cpu – get and put – no i/o wait (coming soon) • check_rfio • check_network – get and put – multiple interfaces • check_process – memory, cpu, number descriptors, number instances and threads CERN IT / GT / DMS 5

Available Probes CERN IT / GT / DMS 6

Available Probes CERN IT / GT / DMS 6

Performance Visualization • Enabled for all probes – Status for some, detailed performance for

Performance Visualization • Enabled for all probes – Status for some, detailed performance for many • pnp 4 nagios (http: //www. pnp 4 nagios. org/) – There are other tools doing the same, this is the one used today in WLCG / EMI – Specific templates provided with our rpm http: //puppet. cern. ch/nagios/pnp 4 nagios/index. php CERN IT / GT / DMS 7

Performance Visualization CERN IT / GT / DMS 8

Performance Visualization CERN IT / GT / DMS 8

Future Work • Release and rollout to sites – And collect feedback, suggestions for

Future Work • Release and rollout to sites – And collect feedback, suggestions for new probes… – RPM available, not yet in any official repository – We’re happy to fulfill your wishes • Current requests – VO specific usage information (space token might be enough and is much easier) – SRM success / fail rate • Other coming probes – Lots of real time performance data • Avg access time per operation, avg transfer time, etc • Extending to other DMS components – LFC, FTS CERN IT / GT / DMS 9

Configuration with Puppet CERN IT / GT / DMS 10

Configuration with Puppet CERN IT / GT / DMS 10

Puppet in DPM • Why? – Looked for a strong configuration management solution •

Puppet in DPM • Why? – Looked for a strong configuration management solution • For our internal testbed(s), but reusable at DPM sites – Strong momentum, community and commercial support • twitter, sun / oracle, rackspace, … – (very) Extensive documentation • How? – – Set of specific modules + generic ones Available via puppet forge or github An alternative… not a replacement for YAIM Current support level is « best effort » https: //svnweb. cern. ch/trac/lcgdm/wiki/Dpm/Admin/Puppet CERN IT / GT / DMS 11

Reusable Modules • All DPM required areas are covered • As generic and reusable

Reusable Modules • All DPM required areas are covered • As generic and reusable as possible – Go ahead and try them for your service CERN DPM DMS VOMS LCGUTIL GLITE MYSQL … https: //forge. puppetlabs. com/users/rocha CERN IT / GT / DMS 12

Cluster Configuration HEAD NODE DISK NODES CLIENT NODES CERN IT / GT / DMS

Cluster Configuration HEAD NODE DISK NODES CLIENT NODES CERN IT / GT / DMS 13

Cluster Configuration GENERIC NODE HEAD NODE DISK NODES CLIENT NODES CERN IT / GT

Cluster Configuration GENERIC NODE HEAD NODE DISK NODES CLIENT NODES CERN IT / GT / DMS 14

Puppet Dashboard CERN IT / GT / DMS 15

Puppet Dashboard CERN IT / GT / DMS 15

Puppet & Nagios Integration • The « cherry on top » • Puppet has

Puppet & Nagios Integration • The « cherry on top » • Puppet has built-in resource types for nagios CERN IT / GT / DMS 16

Future Work • Make it available to interested DPM sites • Improve DPM modules

Future Work • Make it available to interested DPM sites • Improve DPM modules from their experience – Larger setups, bigger problems • Get other product teams to reuse our common modules – And to help us with their maintenance CERN IT / GT / DMS 17