Import XRoot D monitoring data from Mon ALISA

  • Slides: 18
Download presentation
Import XRoot. D monitoring data from Mon. ALISA Sergey Belov, JINR, Dubna Sergey. Belov@cern.

Import XRoot. D monitoring data from Mon. ALISA Sergey Belov, JINR, Dubna Sergey. Belov@cern. ch DNG section meeting, 30. 10. 2012

Motivation • XRoot. D federations monitoring is of importance for ALICE, ATLAS, CMS •

Motivation • XRoot. D federations monitoring is of importance for ALICE, ATLAS, CMS • For the experiments now is more convenient to collect initial monitoring data on their side • Two collector types for Mon. ALISA are in use: – Individual transfers statistics, some server statistics (in ALICE, based on ALICE developments) – Server statistics, some server statistics (in CMS and ATLAS, is fed wit information from UCSD collector) • Goal: to have this information in Dashboard 2

What could we get from ML? • ALICE: transfers – – individual transfer summary

What could we get from ML? • ALICE: transfers – – individual transfer summary for the 60 seconds Server name, client IP Read/write MB NO transfer ID ! • CMS, ATLAS: servers – Incoming and outgoing traffic – Current connections number and total connections ever – Authenticated and unauthenticated logins count, authentication failures number – Redirection count v. For all these parameters their rates (HZ) are also provided 3

How it should be done? Requirements: • Standard way: send information via message brokers,

How it should be done? Requirements: • Standard way: send information via message brokers, in JSON format (ML only as a transport) • Reliability – In messages handling along all the chain – No information loss on failures • Reasonable behavior in sending messages – Send only consistent information – Respect connection frequency, authorization, timeouts – Few big messages instead of hundreds small 4

Dumping data from Mon. ALISA (1) Steps to get the data: • Setup ML

Dumping data from Mon. ALISA (1) Steps to get the data: • Setup ML repository • Subscribe it to the appropriate monitoring groups (alice, xrootd_cms, xrootd_atlas) • Configure ML to consume only required parameters, but do not store anything • Set a custom filter (=handler) putting the data to outside - dumper 5

Dumping data from Mon. ALISA (2) ML result object structure: • “farm”, “cluster”, “node”

Dumping data from Mon. ALISA (2) ML result object structure: • “farm”, “cluster”, “node” – “node” → xrootd server name – “Site name” could be get from “farm” or “cluster” • timestamp • arrays of parameters’ names and values v Most common case: result object “decays” to the objects with just a single parameter name and value in the corresponding arrays – transfer ‘s information should be gathered piece by piece 6

Dumping data from Mon. ALISA (3) The dumper: • Is called each time repository

Dumping data from Mon. ALISA (3) The dumper: • Is called each time repository has results from the subscriptions • Should be fast enough to not slow all the things down (consecutive calls for coming results) • If doing message handling or sending in here, no hope to have a reliable or stable solution 7

Proposed information handling chain xrootd server collector Mon. ALISA collector D U M P

Proposed information handling chain xrootd server collector Mon. ALISA collector D U M P E R local queue AGGREGATOR local queue Messaging Transfer Agent (s) Dashboard Message brokers 8

Technical solutions (1) • ML filter (Dumper) – Java class, catching incoming results from

Technical solutions (1) • ML filter (Dumper) – Java class, catching incoming results from ML – Initial data transformation (decode IPs, etc. ) – Stores data to local directory queues • Aggregator – Python 2. 4 program, aggregating Dumper’s queues and preparing final messages to be sent by MTA – Reads/write messages from local directory queue – Does messages aggregation and grouping 9

Technical solutions (2) • Directory queues libraries – Java implementation: ch. cern. dirq class

Technical solutions (2) • Directory queues libraries – Java implementation: ch. cern. dirq class (by Massimo Paladin) – python-dirq (available in EPEL repository) • Messaging Transfer Agent – stompclt : flexible tool to consume and dispatch messages between different sources in a configurable and reliable way (by Lionel Cons), available in CERN SW rep, in EPEL soon – now STOMP protocol is enough (AMPQ protocol support is on the way with amqpclt tool) 10

Adding more reliability with supervision • Proven concept (Erlang/OTP) – Workers do their work

Adding more reliability with supervision • Proven concept (Erlang/OTP) – Workers do their work – Supervisors monitor workers – All are defined in a supervision tree • Flexible implementation available (simplevisor) – Non intrusive – Handle service evolution Messaging Services and Client Software, Lionel Cons – Massimo Paladin, EGI Technical Forum - Prague, 18 th September 2012 11

Aggregator’s internals • Accumulates statistics on xrootd servers (per timestamp), groups it by hostname

Aggregator’s internals • Accumulates statistics on xrootd servers (per timestamp), groups it by hostname • Reconstructs transfer statistics from subsequent messages, aggregates transfers by server and timestamp • Passes a bunch of messages (by type) in a large message to MTA • Removes all local queues messages involved when aggregated message is successfully sent • All semi-complete information chunks are to be sent on timeout, all (hopelessly) incomplete ones are wiped out • Three threads in the process: – Main (control) – Worker (periodically consume, aggregate, republish for MTA) – Cleanup (remove temporary stuff in directory queues involved) 12

Message formats: xrootd transfers { } { "message_id": "05 b 179 bb…. ", "server_host":

Message formats: xrootd transfers { } { "message_id": "05 b 179 bb…. ", "server_host": "xr. cern. ch", "timestamp": "1123456789", "clients": [ { "client_ip": “ 12. 34. 56. 78", "read_mb": “ 1. 234", "written_mb": “ 2. 345", "transfer_speed_mb": “ 3. 582" }, …. . ] "header": { "message_id": "6061 d 13 b…. ", "mon_service_fqdn": "mon. x. ch", "timestamp": "1223456789", "vo": "alice", }, "body": { "transfers": [ transfers messages ] } * Need VO, or just send to different queues? } 13

Message formats: xrootd servers (1) { { "header": { "message_id": "0 d 502 ae

Message formats: xrootd servers (1) { { "header": { "message_id": "0 d 502 ae 9…. ", "timestamp": "122356789", “mon_host_fqdn": "mon. x. ch", "vo": "atlas|cms", }, "body": { “servers_stats": [ stats messages are here ] } "message_id": "25 e 3 c 2 f 8…. ", "timestamp": "1123456789", "server_host": "example. cern. ch ", "link_in": "5048475", "link_in_R": "5. 1234", "link_out": "10493857", "link_out_R": "7. 2345", "link_tot": "16949274", "xrootd_lgn_af_R": "0. 123", "xrootd_lgn_au_R": "2. 345", "xrootd_lgn_ua_R": "0. 5“, …. } * Need VO, or just send to different queues? } 14

Message formats: xrootd servers (2) Parameter Description link_in [_R] Incoming traffic [rate, B/s] link_out[_R]

Message formats: xrootd servers (2) Parameter Description link_in [_R] Incoming traffic [rate, B/s] link_out[_R] Outgoing traffic [rate, B/s] link_tot[_R] Total connections [rate, Hz] link_num Current connections number xrootd_lgn_af[_R] Authentication failures [rate, Hz] xrootd_lgn_au[_R] Authenticated login [rate, Hz] xrootd_lgn_ua[_R] Unauthenticated login [rate, Hz] xrootd_rdr[_R] Redirection count [rate, Hz] 15

Current state of developments • ML dumper filter is ready and works fine –

Current state of developments • ML dumper filter is ready and works fine – Produces intermediate JSON messages to be consumed by aggregator, no performance limits observed • Aggregator is ready and being tested • Chosen technical solution (directory queues libraries, stopmclt) is proven to be appropriate, fast and scalable 16

Further steps • Tests of full message processing chain (including stress tests) • Consumer

Further steps • Tests of full message processing chain (including stress tests) • Consumer from the Dashboard’s side • Tuning the setting of ML dumper, aggregator and stompclt • Supervision of all components (ML repo, aggregator, MTA) with simplevisor • Packaging of dumper, aggregator and all the configurations to RPM within the Dashboard 17

Thanks for your attention!

Thanks for your attention!