The EDG Testbed The European Data Grid Project
The EDG Testbed The European Data. Grid Project Team http: //www. eu-datagrid. org
Contents u User’s Perspective of the Grid u Grid Services u Hardware Components of an EDG Testbed u The EDG Testbed Configuration u How to set up an EDG Testbed n Obtaining code n Configuring different machines The EDG Testbed - n° 2
A 3 Tier Business Architecture Request Result Client Data Application Server Data Server On the EDG: User Interface Computing Element/ Worker Nodes Storage Element The EDG Testbed - n° 3
Situation on a Grid The EDG Testbed - n° 4
Information Services u u Hardware: n EDG Information Service n Information Providers Data: n u Machine Types: u Information Service (IS) u Replica Catalog (RC) Replica Catalog Software & Services: n EDG Grid Services: s n Information Service Application Services: s Currently only EDG applications directly supported The EDG Testbed - n° 5
Situation on a Grid Cont’d Information Providers Info Service Replica Catalog The EDG Testbed - n° 6
Main EDG Grid Services u Authentication & Authorization u Job submission service n u u Resource Broker Replica Management n Grid Data Mirroring Package (GDMP) n EDG-Replica-Manager (Globus Replica Manager) n Mass storage system support Logging & Bookkeeping The EDG Testbed - n° 7
EDG Logical Machine Types u User Interface (UI) u Information u Computing Service (IS) Element (CE) n Frontend Node n Worker Nodes (WN) u Storage u Replica Element (SE) Catalog (RC) u Resource Broker (RB) The EDG Testbed - n° 8
Services per Machine Type Deamon UI IS CE WN SE RC RB Globus Gatekeeper - - - Replica Catalog - - - GSI-enabled FTPd - - Globus MDS - - - - Info-MDS - - - - Broker - - - Job submission - - - Information Index - - - Logging & Bookkeeping - - - Local Logger - - CRL Update - - Grid mapfile Update - - RFIO - - - GDMP - - - (frontend ) The EDG Testbed - n° 9
A Simple Testbed Configuration “CLOSE” Computing Element 1 Storage Element 1 User Interface Resource Broker Replica Catalog Information Service Storage Element 2 “CLOSE” Computing Element 2 The EDG Testbed - n° 10
Testbeds Application Testbed: End-user Applications n Software: Stable, certified release (EDG 1. 4. 7) Certification Testbed: Extended, Detailed Testing n Software: Tagged release n State: Starting…; Collaboration with Testing Group/LCG. Development Testbed: Integration & Evaluation of SW n n Software: Current tagged release + new pkg. New tagged release. State: Active use; 5 sites involved. Development Machines: Testing of Middleware in Isolation n Software: Bleeding edge versions. n State: Varied; under control of middleware work packages. The EDG Testbed - n° 11
Data. Grid testbeds Application testbed: More than 1000 CPUs EDG sw installed at more than 40 sites 5 Terabyte of storage The EDG Testbed - n° 12
Application Testbed Resources Since Last Year: n Improved software (EDG 1. 4. 7). n Doubled sites. More waiting… s n Australia, Taiwan, USA (U. Wisc. ), UK Sites, INFN, French sites, Cross. Grid, … Significantly more CPU/Storage. Hidden Infrastructure Site Country CPUs Storage CC-IN 2 P 3* FR 620 192 GB CERN* CH 138 1321 GB CNAF* IT 48 1300 GB Ecole Poly. FR 6 220 GB Imperial Coll. UK 92 450 GB Liverpool UK 2 10 GB Manchester UK 9 15 GB n MDS Hierarchy NIKHEF* NL 142 433 GB n Resource Brokers Oxford UK 1 30 GB n User Interfaces Padova IT 11 666 GB VO Replica Catalogs RAL* UK 6 332 GB n VO Membership Servers SARA NL 0 10000+ GB n 14969 GB n Certification Authorities 1075 TOTAL 5 *also Dev. TB; +200 TB including tape The EDG Testbed - n° 13
The EDG Testbed - n° 14
Example IS Content Site: NIKHEF ------------------------CE tbn 09. nikhef. nl: 2119/jobmanager-pbs-qlong: - PBS queue "qlong" with 96 hours time limit - Software installed: CMS-1. 0. 2 ATLAS-1. 3. 0 ALICE-3. 07. 01 LHCb-1. 1. 1 IDL-5. 4 NIKHEF D 0 MCC-0. 1 -1 - There are 0 jobs running and 0 waiting, with 16 CPUs free Close SE tbn 03. nikhef. nl with mount point /flatfiles -------------------------CEqshort: tbn 09. nikhef. nl: 2119/jobmanager-pbs- PBS limitqueue "qshort" with 240 minutes time - Software installed: CMS-1. 0. 2 ATLAS-1. 3. 0 ALICE-3. 07. 01 LHCb-1. 1. 1 IDL-5. 4 NIKHEF D 0 MCC-0. 1 -1 -------------------------SE tbn 03. nikhef. nl close to 2 CEs: - tbn 09. nikhef. nl: 2119/jobmanager-pbsqshort - tbn 09. nikhef. nl: 2119/jobmanager-pbsqlong - VOs supported: alice atlas biomedical cms earthob lhcb iteam - gridftp on port 2811 - rfio on port 3147 - file - 31744 Mb of free space - There are 0 jobs running and 0 waiting, with 16 CPUs free Close SE tbn 03. nikhef. nl with mount point /flatfiles The EDG Testbed - n° 15
EDG Software Distribution u All software available as source & binary RPMs u Binaries for Red. Hat 6. 2 and Red. Hat 7. 3 u > 600 packages including u n Complete globus distribution n EDG packages (~50 packages) n Support tools (perl, ant, jdk, …) Pre-packaged for different machine types The EDG Testbed - n° 16
Automatic EDG Fabric Management Setup Tasks u u Node Installation & Management Configuration Management Runtime Tasks u Monitoring & Fault Tolerance u Resource Management Ru tr ntim igg e er ta se sks tu Ne p ma w n ma task y a Fa (e s uto ch. g ilur ma ine e. tic s re de jo st te all in ar ct y th tin io e g n/r gr da ep id em ai on r s) n The EDG Testbed - n° 17
LCFG (Local Con. Fi. Guration system) u Developed at University of Edinburgh u Widely used fabric installation & configuration tool u Automated installation and configuration in a very diverse and evolving environment LCFG configuration files Web Server HTTP XML Compiler (mkxprof) LCFG SERVER Notif y rdxprof le Profi UDP Acknowledge Generic Component DBM File LCFG Components LCFG CLIENT The EDG Testbed - n° 18
Example LCFG Configuration File XML profiles Config files +inet. services telnet login ftp +inet. allow sshd telnet login ftp <inet> mkxprof +inet. allow_telnet ALLOWED_NETWORKS <allow cfg: template="allow_$ tag_$ daemon_$"> <allow_RECORD cfg: name="telnet"> +inet. allow_login ALLOWED_NETWORKS <allow>192. 168. , 192. 135. 30. </allow> </allow_RECORD> +inet. allow_ftp ALLOWED_NETWORKS . . . +inet. allow_sshd ALL </inet> +inet. daemon_sshd yes <auth> <user_RECORD cfg: name="mickey"> . . . +auth. users myckey +auth. userhome_mickey /home/mickey +auth. usershell_mickey <userhome>/home/Mickey. Mouse. Home</userhom e> <usershell>/bin/tcsh</usershell> /bin/tcsh </user_RECORD> The EDG Testbed - n° 19
Fabric Monitoring & Fault Tolerance Consumer DB Consumer Sensor Central Repository Collector agent Sensor Decision unit Actuator agent Actuator monitoring Cache Rule config Consumer Local Node The EDG Testbed - n° 20
Wrap Up u Logical machine types of an EDG Testbed u Mapping of services to logical machines u Example and current EDG Testbed configuration u Code distribution strategy u Fabric management strategy èHow to set up an EDG Testbed The EDG Testbed - n° 21
LCFG Installation Server setup: u u Download rpms (perl + lcfg + apache) Install rpms u Start http server (apache, …) u Create configuration files u Run mkxprof on them Client setup: u Download rpms (perl + lcfg) u Install rpms u Reboot (rdxprof will be started) Configuration management (server): u Update config files u Run mkxprof The EDG Testbed - n° 22
EDG Machine Installation On the LCFG server: u Create directories for rpms u Download rpms from central edg repository u Create LCFG profile for each client machine: n Filename = hostname; includes machine type specific config file and site specific config file (needs to be customized!) n Example templates + rpm-lists are provided n Run mkxprof on each of these files On the LCFG clients: u D O N E Setup clients as describe before The EDG Testbed - n° 23
Manual Setup (without LCFG) u Download rpms directly on machine (RPM-lists per machine type exist) u Install rpms u Configure individual services (see installation guide) The EDG Testbed - n° 24
Issues when Adding new Sites to the Testbed u EDG is currently setting-up procedures explaining how to add new sites n n u Need to clarify the “minimum requirements” for a site to become a member of the testbed n n u A number of regular tasks have to be performed by the sites administrators A maximum delay needs to be defined for responding to requests/problems if the testbed is to run efficiently Sites from new countries have to identify/create a supporting CA n u Variations already tested with Taiwan and Romania Step-by-step instructions produced which we expect to become simpler over time Since CAs need mutual trust this could lead to an explosion of inspection activities Some tasks will fall on the people responsible for managing the VOs n HEP experiment secretariats already perform some level of authentication of their institutes and members. How an we get some leverage from this? The EDG Testbed - n° 25
Summary u Logical machine types of an EDG Testbed u Mapping of services to logical machines u Example and current EDG Testbed configuration u Code distribution strategy u Fabric management strategy u How to obtains EDG software u How to automatically configure machines The EDG Testbed - n° 26
Outlook u u Release 1. 4. 7 currently deployed Release 2. 0 (currently being deployed, rollout expected May 2003) will contain more advanced services n Advanced information systems (based upon relational databases) n Enhanced security n Optimization (resource broker and replica management) n Fabric management with monitoring, automatic fault detection & recovery The EDG Testbed - n° 27
Further Information u EDG Testbed homepage: http: //marianne. in 2 p 3. fr/ u Fabric management: http: //hep-proj-grid-fabric. web. cern. ch/hep-proj-gridfabric/ u LCFG on EDG Testbed information: http: //www. lnl. infn. it/datagrid/wp 4 -install/ http: //datagrid. in 2 p 3. fr/distribution/datagrid/wp 4/installation/d oc/ The EDG Testbed - n° 28
- Slides: 28