Project Status Report SAMGrid w SAMGrid Management Status

  • Slides: 23
Download presentation
Project Status Report : SAMGrid w SAMGrid Management, Status, Operations – Merritt w SAMGrid

Project Status Report : SAMGrid w SAMGrid Management, Status, Operations – Merritt w SAMGrid Development I. – Veseli w SAMGrid Development II. – Kennedy w SAMGrid Future Plans – St. Denis 18 Feb 2004 Computing Division Project Status Report 1

SAMGrid Project Description w Purpose: Provide data handling services to Run II experiments and

SAMGrid Project Description w Purpose: Provide data handling services to Run II experiments and other interested experiments with similar problems. These services should scale in performance and convenience for cataloging and delivery of Petabyte-sized datasets, and should evolve to availability in relevant Grid environments. w Current stakeholders: CDF, DØ, MINOS, CD w Duration: Development effort is expected to extend through ’ 07 as the components move to become Grid services. High-level maintenance ( I. e. , effort that includes capability to respond to feature requests) is expected to continue through at least the data collection lifetime of the stakeholders. 18 Feb 2004 Computing Division Project Status Report 2

The SAM-Grid Team • Revised Management Plan went into effect Dec 03 Project Co-Leaders:

The SAM-Grid Team • Revised Management Plan went into effect Dec 03 Project Co-Leaders: Wyatt Merritt CD/DØCA Rick St. Denis CDF/ U Glasgow Project Technical Co-Leaders: Rob Kennedy CD/CDF Sinisa Veseli CD/DØCA CCF: Andrew Baranovski, Gabriele Garzoglio, Igor Terekhov CEPA: Carmenita Moore, Steve White (0. 5 FTE) CDF: Randy Herber, Art Kreymer, Stefan Stonjek (GS) DØCA: Lauri Loebel Carpenter, Robert Illingworth, Adam Lyon 18 Feb 2004 Computing Division Project Status Report 3

The SAM-Grid Team - Extended Database support (CSS-DSG): Diana Bonham, Anil Kumar Associated internal

The SAM-Grid Team - Extended Database support (CSS-DSG): Diana Bonham, Anil Kumar Associated internal projects: RUNJOB (with CMS) Authorization Project (with CMS, still being defined) Associated external projects: PPDG Sankalp Jain, Aditya Nishandar Grid. PP Morag Burgon-Lyon, Valeria Bartsch, Iain Bertram, Dave Evans, Peter Love SBIR II Matt Vranicar, Jeremy Simmons, Josh Gramlich, Ngan Mac. Donald, John Grace 18 Feb 2004 Computing Division Project Status Report 4

SAMGrid Project Management & Organization w Project co-leaders è è Represent largest stakeholders: requirements

SAMGrid Project Management & Organization w Project co-leaders è è Represent largest stakeholders: requirements & priorities Run weekly design meetings w Project technical leaders è è Run weekly operations meeting Conduct subproject assessments w Active Subprojects: C++ API, DBServer, JIM, H Stream Reco for CDF, Caching, Chains&Links, CDF DFC, Test Harness, Linux deploy of DBServers, Config Man w Planned Subprojects: Request system, Autodest, Further monitoring (MIS) w Related Subprojects: d 0 tools, SBIR II, Condor mods, workflow packages for CDF & D 0, Authorization & Accounting w Recently completed Subprojects: Python API, V 5. 1 Schema Design, Batch Adapter, D 0 Online dcache TDP, 1 st Gen Monitoring Tools, Data Dimensions Grammar 18 Feb 2004 Computing Division Project Status Report 5

SAMGrid Components w Event/File Catalog for metadata (contents & processing) and locations w Dbservers

SAMGrid Components w Event/File Catalog for metadata (contents & processing) and locations w Dbservers for accessing catalog w Station servers for file delivery to projects w Optimizer w File storage server w Interface to station cache and MSS (samcp) w JIM components for Grid job submission & monitoring w User API w C++ client API 18 Feb 2004 Computing Division Project Status Report 6

Status and deployments of SAMGrid For DØ w Operational @ FNAL: online, reco farm,

Status and deployments of SAMGrid For DØ w Operational @ FNAL: online, reco farm, d 0 mino, cab, new cab, clued 0 w Operational @ Monte Carlo production sites w Operational @ remote analysis sites: ~20 active, ~40 deployed w Operational 11/03 – 2/04 for remote reconstruction: IN 2 P 3, UKGrid (Manchester/ICL/RAL), West. GRID, Grid. KA, NIKHEF -- 97 M events reprocessed remotely w Stats: ~78 K proj FNAL, >14 K proj remote (since 1/1/03) 60 billion evts, 3 PB, 8 M files consumed (all D 0 stations) 18 Feb 2004 Computing Division Project Status Report 7

D 0 18 Feb 2004 Computing Division Project Status Report 8

D 0 18 Feb 2004 Computing Division Project Status Report 8

D 0 Files 4000 -8000 Files/Day 18 Feb 2004 Computing Division Project Status Report

D 0 Files 4000 -8000 Files/Day 18 Feb 2004 Computing Division Project Status Report 9

D 0 Files Per Month By Year 1999 2000 2001 2002 2003 100, 000

D 0 Files Per Month By Year 1999 2000 2001 2002 2003 100, 000 files Run II Start 18 Feb 2004 Computing Division Project Status Report 10

D 0 Total Files 2. 5 Million Files Served 18 Feb 2004 Computing Division

D 0 Total Files 2. 5 Million Files Served 18 Feb 2004 Computing Division Project Status Report 11

D 0 Total Data Moved 700 TB moved 18 Feb 2004 Computing Division Project

D 0 Total Data Moved 700 TB moved 18 Feb 2004 Computing Division Project Status Report 12

Status and deployments of SAMGrid For CDF w Operational 24/7 to store online metadata

Status and deployments of SAMGrid For CDF w Operational 24/7 to store online metadata w Operational at remote stations: ~15 active, ~30 deployed Large recent increase: Fla. Wkshp! w In testing for Monte Carlo production w File delivery tests up to 20 TB on testcaf w Statistics: ~3000 proj total (since 1/1/03) Ø Note CDF usage pattern is different from DØ: CDF moves more GB (but not more events) because it does not use small summary format like DØ thumbnail. 18 Feb 2004 Computing Division Project Status Report 13

CDF Florida DH Workshop Now 20! w 11 installations in about 2 hours. Integrated

CDF Florida DH Workshop Now 20! w 11 installations in about 2 hours. Integrated with d. CAF in 2 cases in 2 days. w 3 in Asia, 4 in Europe w 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) w Sam installation now: initsam cdf <stationname> w Follow-up on April 1. w Each site has a local user support person to reduce load on core development team. w Generally: Security ate 80% of the effort! 18 Feb 2004 Computing Division Project Status Report 14

18 Feb 2004 Computing Division Project Status Report 15

18 Feb 2004 Computing Division Project Status Report 15

2 TB/Day: Karlsruhe 18 Feb 2004 Computing Division Project Status Report 16

2 TB/Day: Karlsruhe 18 Feb 2004 Computing Division Project Status Report 16

CDF Dcache on CAF ALL CDF on CAF reads 20 TB/Day 18 Feb 2004

CDF Dcache on CAF ALL CDF on CAF reads 20 TB/Day 18 Feb 2004 Computing Division Project Status Report 17

18 Feb 2004 Computing Division Project Status Report 18

18 Feb 2004 Computing Division Project Status Report 18

18 Feb 2004 Computing Division Project Status Report 19

18 Feb 2004 Computing Division Project Status Report 19

Status and deployments of JIM w w Job broker, execution and submission site software,

Status and deployments of JIM w w Job broker, execution and submission site software, job monitor, client software for grid job submission Deployment plan for DØ Monte Carlo 1. Test at 3 sites (Manchester, CCIN 2 P 3, Wisconsin) with basic functionality and measure efficiency of job completion 2. 3. 4. 5. 6. 18 Feb 2004 Manchester CCIN 2 P 3 Wisconsin JIM eff >99% Site eff x Code eff ~85% ~60% Verify use by experimenter for job submission (this week) Add merging Move to production at these 3 sites (DØ milestone: Mar 1) Add remainder of DØ MC sites (Lancaster, SAR, NIKHEF, Prague) Improve brokering algorithm Computing Division Project Status Report 20

JIM Issues w Site operational requirements (e. g. clock synch, disk & node reliability,

JIM Issues w Site operational requirements (e. g. clock synch, disk & node reliability, OS issues) w Experiment operational requirements (e. g. code footprint may exceed site capability and is variable w/ release) w File transfer capabilities & policies: cf. mtg this week w/ Grid. KA rep w Allocation of services to head node vs worker nodes w Sandboxing mechanisms (last week design mtg) w Merging mechanism, brokering (this week design mtg) 18 Feb 2004 Computing Division Project Status Report 21

Operational Model w Experiments provide shifters for 1 st line problem fielding and solving

Operational Model w Experiments provide shifters for 1 st line problem fielding and solving w Project provides on-call list from developers w At DØ, on average ~60 – 80% of problems are answered by shifters w Classes of problems è è è Routine jobs like adding info to database Less routine: cleanup after failed stores Answering user questions regarding usage Updating documentation Investigating user reports of problems, and problems visible in project monitoring tools Providing solutions for problems 18 Feb 2004 Computing Division Project Status Report 22

Operations Outlook w Improve documentation with aim of improving shifter & user ability to

Operations Outlook w Improve documentation with aim of improving shifter & user ability to diagnose/solve problems w Expect doubling of central station capacity at DØ w Expect transition to more SAM usage at CDF w Expect Grid operations in production for simulation, first at DØ then at CDF 18 Feb 2004 Computing Division Project Status Report 23