Aggregated Topology Provider ATP A Grid Topology Repository

  • Slides: 1
Download presentation
Aggregated Topology Provider (ATP): A Grid Topology Repository for the WLCG Rajesh Kalmady, Pradyumna

Aggregated Topology Provider (ATP): A Grid Topology Repository for the WLCG Rajesh Kalmady, Pradyumna Joshi, Digamber Sonvane, Phool Chand, Kislay Bhatt, Kumar Vaibhav, Vinod Boppanna - BARC James Casey, David Collados, John Shade - CERN Topology Information Aggregation – Why, What and How? Information Providers Current flow of Grid topology data across various monitoring tools Today, various WLCG stake holders such as VOs, sites, and ROCs use different monitoring and reporting tools like Gridview, SAM, Experiment Dashboards, Gridmap etc. for monitoring and visualizing grid resources. These tools rely heavily on the topology of the grid resources, which mainly consists of : OIM list EE info. VO n Provides information about the OSG resources, such as, sites, services that support WLCG project. atio on y in form ati rm olog Manual inputs te Si Top nfo y. I log po To s, er r pe OSG Interoperability Monitoring list: , ti U Refined resource list ation The Aggregated Topology Provider (ATP) is a grid topology repository being developed for the WLCG in order to have a single authoritative source of topology information of grid resources and to streamline the flow of this data across various tools in the WLCG monitoring infrastructure. It gathers, aggregates and stores the topology information by periodically contacting all information providers. The aggregated information is then made available to various high-level monitoring tools. The ATP service will also track changes in the topology of grid resources by keeping history. Inform T CP Stores the WLCG VO cards’ information. s on ati er s fed oup gr logy nt g olo p o CIC Operations Portal: Gridview Topo Cou rm fo n y. I ion at nt The tools source their data from a number of information providers while they act mostly as information consumers. Although these tools do a great job in visualizing complex monitoring data of the Grid, the flow of information among different producers and consumers needs to be improved for enhanced reliability. In particular, as these tools exchange information about grid resources as needed, there a number of crisscross database interconnections. The current scenario for various monitoring tools is shown in the upper figure. There is no single authoritative information source that can be queried, thus hampering the effectiveness of aggregation and consumption of data by the applications. es s SAM BDII Gridmap Provides information about which services are currently advertised in the Grid, plus VO mappings to them. card s urc o res rce ou res es, VO servic CPU • Associations between the entities defined above: Project-VOs, VOSites GOCDB topology ings mapp The EGEE authoritative source of topology information, such as: sites, nodes, services, etc. It is used by many monitoring and accounting tools. BDII: EG G OS • Sites and associated resources: Sites, Services, Nodes and their groupings • High-level resources and user communities : Project, Infrastructure, VOs and their grouping CIC Operations portal GOCDB GOC DB: u Co Manual inputs: Dashboards Information about federations and tiers from the WLCG project office. With ATP this will be automated by sourcing data from a proposed WLCG office portal. Streamlined grid topology data flow using the ATP Information Consumers Information Providers OSG Resources GOCDB CIC Portal WLCG Office Portal BDII Topology Repository : Information Aggregator ATP SAM Other Tools Information Consumers Gridview Dashboards Gridmap Various information providers and consumers of ATP data are shown in the drawing above. As seen, the introduction of the ATP in the grid monitoring infrastructure streamlines the flow of information across various monitoring tools, thereby improving data consistency and architectural robustness. SAM (Service Availability Monitoring): a framework for the monitoring of production and pre-production grid sites using a set of probes at regular intervals. Gridview: a monitoring and visualization tool that provides a high level view of various functional aspects of WLCG like service status, availability, reliability, and statistics of data transfers, FTS file transfers, jobs running etc. Experiment Dashboards: these provide an overview of the distributed computing activities of the LHC experiments ATLAS, ALICE, CMS & LHCb. Gridmap: top-level grid services monitoring visualization tool Other user-specific applications: various applications developed at the userend Features and Present State ATP Future Directions ØNew abstract entities like Project and Infrastructure v Project : WLCG v Infrastructures: EGEE, OSG ØSupport for grouping of Infrastructures, VOs, sites, services ØUser-specified site names in VOs ØSynchronization with different information providers Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources are not static. Presently, historical information is not being preserved. This frequently leads to misleading and inconsistent displays, and a lot of time and effort can be wasted. It is planned to provide historical information that will allow us to peep through the time window -past and present- for tracking the status of various grid resources. This is a vital feature for re-computing service availability and reliability numbers in Gridview. A list of requirements was gathered from various sources e. g. Gridview, SAM, Dashboards, LHC experiments. The first prototype of the ATP is currently being tested, including the modules that aggregate, store and present the information in a standardized way. Programmatic interface for other tools: The monitoring tools frequently rely on a number of other databases for extracting required information and in most cases, direct connections to databases are established, leading to problems. The ATP will provide standard interfaces abstracting the underlying structure and allowing the retrieval of information in a structured way. Conclusions: References: The ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information. The repository will also be extremely useful for tracking historical information of grid resources in the WLCG infrastructure. The ATP data will also be consumed by other high-level monitoring tools, improving their traceability, accuracy and reliability. The first prototype system is in the testing phase, and will be deployed soon in production for 24*7 use in the grid. WLCG: http: //lcg. web. cern. ch/LCG/ OAT Strategy: https: //edms. cern. ch/document/927171 Grid. View: https: //twiki. cern. ch/twiki/bin/view/LCG/Grid. View SAM: https: //twiki. cern. ch/twiki/bin/view/LCG/Sam. Cern