DB ES Experiment Support WLCG Transfers Dashboard WLCG
DB ES Experiment Support WLCG Transfers Dashboard WLCG Workshop in conjunction with CHEP 2012, 20. 05. 2012, New York Julia Andreeva, David Tuckett, Daniel Dieguez, Danila Oleynik, Artem Petrosyan, Gunnar Roe, Michail Salichos, Alexandr Uzhinskiy CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it
ES Contents • Motivation • Overview of the key concepts of the WLCG transfer monitoring system • Current status and issues • Dashboard UI • Integration of x. Root. D monitoring • Summary CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 2
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Motivation • Currently there is no tool which can provide an overall view of data transfer on the WLCG scope (across LHC experiments, across various technologies used, for example FTS and x. Root. D, across multiple local FTS instances, etc. . ) • Every LHC experiment follows it’s own data transfers through a VO-specific monitoring system. • There is a clear similarity between the tasks performed by all VO-specific transfer monitoring systems. Operations like aggregation of the FTS transfer statistics is done by every VO separately, though it could be done once , centrally and then can be served to all experiments via well defined set of APIs • In order to organize data transfer in the most efficient way experiments need more information than is currently available. For example correlations of data transfer between experiments, latencies related to SRM operations during data transfers, etc. . . Julia Andreeva, WLCG Workshop 3
ES Concept (1) • WLCG transfer monitoring is a common solution which provides cross-VO, cross-technology view not coupled with any VO-specific data management system FTS instance Xrootd etc Currently main technology for CMS, ATLAS and LHCb Dashboard API MSG Broker UI VO Monitoring Currently main technology for ALICE • VO transfer monitoring integration – Transfer events via MSG broker • Avoids polling and screen-scraping local FTS instances – Transfer statistics via Dashboard API • Avoids redundant event storage and statistics generation – Transfer plots via Dashboard UI • Avoids redundant development of common plots CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 4
ES Concept (2) • WLCG transfer monitoring is a common solution which provides cross-VO, cross-technology view not coupled with any VO-specific data management system FTS instance Dashboard MSG Broker API UI VO Monitoring Xrootd etc Implementation started with FTS monitoring CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 5
ES Current status (1) • Required deployment of FTS 2. 2. 8 which was enabled for transfer status reporting via MSG (GT group) • The prototype is up and running for more than half a year • Example of excellent collaboration between several groups in CERN IT (ES, GT, PES, DB) , between IT and PH ( active participation of CMS and ATLAS computing teams), between CERN and JINR (Dubna) CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 6
ES Current status (2) • WLCG Transfer Dashboard was developed using a similar schema and UI as ATLAS DDM Dashboard. This allowed a prototype to be put in place in a short time ~ 2 months. • Full production setup is in place: -The schema was validated by ORACLE experts from CERN IT-DB and was deployed in production -Production collectors and UIs are running in a redundant mode (2 hosts), - 2 production message brokers are setup (many thanks to Lionel Cons (CERN IT-GT) and CERN IT-PES group) CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it • Testing and integration environment created ( integration DB, test message broker, VMs for collectors and UIs) • Alarms are enabled in case any of the production FTS instances does not report for longer than 2 hours • First cycle of validation was performed by the CMS colleagues (special thanks to Jozep Flix) and all reported bugs were fixed • No problems with UIs or collectors were detected over last months • Delayed announcing system to be in production due to results of the consistency checks Julia Andreeva, WLCG Workshop 7
ES Current status (3) • The most important step of validation is consistency checks performed in order to understand data trustworthiness. Data is compared between WLCG Transfer Dashboard and Phedex and ATLAS DDM Dashboard. • First results of consistency checks with pilot FTS server were very promising. However, after deployment of FTS 2. 2. 8 to all T 1 s, consistency checks showed a big discrepancy, in particular for ATLAS up to 50% • Problem was understood, thanks to Michail Salichos (CERNIT-GT). It is caused by a bug in active. MQ-cpp client used by FTS publisher. • Workaround was found (Michail Salinchos). A fixed version of the FTS publisher was deployed to the Triumf and ASGC FTSs 3 weeks ago. Permanent consistency checks show perfect agreement. • Tentative schedule for service to be in production 2 -3 weeks from now. Depends on patching of all FTS services for active. MQ-cpp client bug. CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 8
ES Consistency checks ATLAS DDM plot for Triumf CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it WLCG Transfers Dashboard plot for Triumf Julia Andreeva, WLCG Workshop 9
Experiment Support Dashboard UI: overview Key Features • Flexible filtering and grouping • Statistics matrix & error samples • Customizable plots • Web API: JSON, XML Implementation • Uses common xbrowse UI framework originally developed for ATLAS DDM Dashboard 2. 0 Julia Andreeva, WLCG Workshop 10
Experiment Support Dashboard UI: filtering & grouping • Filtering by sliding/fixed interval • Filtering by VO • Filtering by FTS server • Filtering and grouping of sources / destinations by country, site, host, token • GOCDB naming for cross-VO view • VO-specific naming for single-VO view Julia Andreeva, WLCG Workshop 11
Experiment Support Dashboard UI: matrix & error samples • Matrix – Source – Destination X – Efficiency – Throughput – Successes – Failures • Error samples Julia. Andreeva, WLCG Workshop 12
Experiment Support Dashboard UI: plots • Plots – Source – Destination – VO X – Efficiency – Throughput – Successes – Failures Different kinds of plots are available Possibility to customize plots (time bins, # of shown Items, etc…) See backup slides Julia Andreeva, WLCG Workshop 13
Experiment Support Dashboard UI: consistency • Throughput side-by-side: Ph. EDEx v. Dashboard 12 hours CERN KIT • Throughput difference: relative & absolute 24 hours CERN RAL • In development – Automated cross-checking with alarms Julia Andreeva, WLCG Workshop 14
ES Next steps (FTS monitoring) • System developers work in close contact with the VOs. Thanks a lot for active participation of CMS and ATLAS. Got a lot of feature requests, which will be addressed by the future development: – Filter by FTS channel – FTS channel status: current and evolution – Status of the FTS queues. Correlations between transfer performance metrics and status of the queue – Transfer part statistics: SRM overhead, GRIDFTP - Ranking plots and quality map plots CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 15
ES Integration of XRoot. D transfers • FTS instance • Xrootd CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it • Dashboard • MSG Broker • API • UI • VO Monitoring XRoot. D federation monitoring part is under development Is being developed mainly by JINR (Dubna) Julia Andreeva, WLCG Workshop 16
ES XRoot. D monitoring • Is being implemented with 3 levels of hierarchy -local site -federation -global CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 17
ES x. Root. D monitoring architecture Users. VO computing teams, Federation support teams Userssite administrators and VO support teams at the site CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Users. VO computing teams, site administrators, VO management, WLCG management` Julia Andreeva, WLCG Workshop 18
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it XRoot. D monitoring architecture Julia Andreeva, WLCG Workshop 19
ES x. Root. D monitoring (local site) • • • CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it There are two implementations: -based on Mon. Alisa (used by ALICE and with some extensions by CMS) -developed in the framework of Tier 3 monitoring project for ATLAS (Ganglia) Both approaches use XRoot. D monitoring data (smry and detailed flow) reported by XRoot. D redirectors with UDP. Not event-like content CMS and ATLAS developed readers reformatting these flows into event-like data which contains: event time, source and destination domains, path and filename, username, file size, #bytes read/written There is no knowledge about federation topology at the site level Event-like data complemented with the name of the site which hosts the publisher is published to MSG Mon. Alisa or Ganglia UI Julia Andreeva, WLCG Workshop 20
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it XRoot. D monitoring architecture Julia Andreeva, WLCG Workshop 21
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it XRoot. D monitoring (Federation) • At the federation level data published by the sites will be consumed from MSG. • Events coming from different sites will be aggregated and complemented with topology information. Currently data processing on the federation level is planned to be implemented with map reduce (Under development) • Transfers handled by federation will be exposed through the federation UI. • Implementation of the Federation UI is similar to the UI of the Global Transfer Dashboard. Adapting global WLCG Transfer UI is straightforward since it is Java. Script client application which expects data in JSON format, fully decoupled from the data source. • First prototype should be ready by the end of June. • Federation data in the format similar to FTS transfer status messages will be published to MSG for global monitoring system Julia Andreeva, WLCG Workshop 22
ES XRoot. D monitoring • On the Global level implementation done for FTS should be to a big extent re-used for XRoot. D (collectors and UI) • Plan to have full chain enabled by the end of the year CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 23
ES Summary • The FTS monitoring part is ready and will be announced to be in production as soon as all FTS instances are patched for active. MQ-cpp client bug. Further development follows the requirements of the experiments • The XRoot. D monitoring part is in the active development phase. Progressing well. Hopefully the first prototype will be ready by the end of June. Full functionality should be enabled by the end of the year • Having FTS and XRoot. D monitoring covered by a global monitoring system would allow to provide pretty complete picture of the WLCG transfers. • Example of excellent collaboration of several groups in CERN IT, IT and PH, CERN and JINR CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Julia Andreeva, WLCG Workshop 24
ES Links • Dashboard UI (prototype) http: //dashb-wlcg-transfers. cern. ch/ui/ • Twiki https: //twiki. cern. ch/twiki/bin/view/LCG/WLCGTransfer. Monitoring https: //twiki. cern. ch/twiki/bin/view/LCG/WLCGTransfers. Dashboard • Feedback wlcg-transfer-monitor@cern. ch • CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Please see a poster during CHEP poster session Julia Andreeva, WLCG Workshop 25
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Backup slides. Dashboard UI: plot types Julia Andreeva, WLCG Workshop
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Dashboard UI: plot types Julia Andreeva, WLCG Workshop
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Dashboard UI: plot types Julia Andreeva, WLCG Workshop
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop
ES CERN IT Department CH-1211 Geneva 23 Switzerland www. cern. ch/it Dashboard UI: plot customisation Julia Andreeva, WLCG Workshop
- Slides: 31