Introduction to Grid computing and overview of the

  • Slides: 28
Download presentation
Introduction to Grid computing and overview of the European Data Grid Project The European

Introduction to Grid computing and overview of the European Data Grid Project The European Data. Grid Project Team http: //www. eu-datagrid. org Data. Grid is a project funded by the European Union Grid Tutorial 2/28/2021 – n° 1

Overview u What is Grid computing ? u What is a Grid ? u

Overview u What is Grid computing ? u What is a Grid ? u Why Grids ? u Grid projects world wide u The European Data Grid n Overview of EDG goals and organization n Overview of the EDG middleware components Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 2

The Grid Vision Researchers perform their activities regardless geographical location, interact with colleagues, share

The Grid Vision Researchers perform their activities regardless geographical location, interact with colleagues, share and access data The Grid: networked data processing centres and ”middleware” software as the “glue” of resources. Scientific instruments and experiments provide huge amount of data Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 3

What is Grid computing : ucoordinated resource sharing and problem solving in dynamic, multi-institutional

What is Grid computing : ucoordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. [ I. Foster] n A VO is a collection of users sharing similar needs and requirements in their access to processing, data and distributed resources and pursuing similar goals. u. Key n concept : ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose [I. Foster] Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 4

The Grid distributed computing idea 1/2 Once upon a time……. . mainframe Microcomputer Mini

The Grid distributed computing idea 1/2 Once upon a time……. . mainframe Microcomputer Mini Computer Cluster (by Christophe Jacquet) Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 5

The Grid distributed computing idea 2/2 …and today (by Christophe Jacquet) Grid Tutorial -

The Grid distributed computing idea 2/2 …and today (by Christophe Jacquet) Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 6

Differences between Grids and distributed applications u. Distributed applications already exist, but they tend

Differences between Grids and distributed applications u. Distributed applications already exist, but they tend to be specialised systems intended for a single purpose or user group u. Grids n Different kinds of resources s n Not always the same hardware, data and applications Different kinds of interactions s n go further and take into account: User groups or applications want to interact with Grids in different ways Dynamic nature s Resources and users added/removed/changed frequently Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 7

Main Services of a Grid architecture u Service providers n Publish the availability of

Main Services of a Grid architecture u Service providers n Publish the availability of their services via information systems n Such services may come-and-go or change dynamically n E. g. a testbed site that offers x CPUs and y GB of storage u Service n n brokers Register and categorize published services and provide search capabilities E. g. 1) EDG Resource Broker selects the best site for a “job” 2) Catalogues of data held at each testbed site u Service requesters n Single sign-on: log into the grid once n Use brokering services to find a needed service and employ it n E. g. CMS physicists submit a simulation job that needs 12 CPUs for 6 hours and 15 GB which gets scheduled, via the Resource Broker, on the CERN testbed site Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 8

Grid security u Resource providers are essentially “opening themselves up” to itinerant users u

Grid security u Resource providers are essentially “opening themselves up” to itinerant users u Secure n access to resources is required X. 509 Public Key Infrastructure u User’s identity has to be certified by (mutually recognized) national Certification Authorities (CAs) u Resources (node machines) have to be certified by CAs u Temporary delegation from users to processes to be executed “in user’s name” ( proxy certificates ) u Common agreed policies for accessing resource and handling user’s rights across different domains within Virtual Organizations Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 9

Why Grids u Scale n of the problems frontier research in many different fields

Why Grids u Scale n of the problems frontier research in many different fields today requires world-wide collaborations (i. e. multi-domain access to distributed resources) u Grids provide access to large data processing power and huge data storage possibilities n As the Grid grows its usefulness increases (more resources available) u Large n n communities of possible Grid users : High Energy Physics Environmental studies: Earthquakes forecast, geologic and climate changes, ozone monitoring n Biology, Genetics, Earth Observation n Astrophysics, n New composite materials research n Astronautics, etc. Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 10

High Energy Physics The LHC Detectors CMS ATLAS ~6 -8 Peta. Bytes / year

High Energy Physics The LHC Detectors CMS ATLAS ~6 -8 Peta. Bytes / year ~108 events/year ~103 batch and interactive users Federico. carminati , EU review presentation LHCb Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 11

Earth Observation ESA missions: • about 100 Gbytes of data per day (ERS 1/2)

Earth Observation ESA missions: • about 100 Gbytes of data per day (ERS 1/2) • 500 Gbytes, for the next ENVISAT mission (2002). Data. Grid contribute to EO: • enhance the ability to access high level products • allow reprocessing of large historical archives • improve Earth science complex applications (data fusion, data mining, modelling …) Source: L. Fusco, June 2001 Federico. Carminati , EU review presentation, 1 March 2002 Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 12

Biology – u Bio-informatics n Phylogenetics n Search for primers n Statistical genetics n

Biology – u Bio-informatics n Phylogenetics n Search for primers n Statistical genetics n Bio-informatics web portal n Parasitology n Data-mining on DNA chips n u Bio. Informatics Geometrical protein comparison 1. Query the medical image database and retrieve a patient image Medical imaging n n Exam image patient key ACL. . . Medical Metadata images MR image simulation Medical data and metadata management n Mammographies analysis n Simulation platform for PET/SPECT Applications deployed Applications tested on EDG Applications under preparation 2. Compute similarity measures over the database images Submit 1 job per image 3. Retrieve most similar cases Similar images Low score images Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 13

Major existing Grid projects (1/2) u Europe-based projects: n European Data. Grid (EDG) :

Major existing Grid projects (1/2) u Europe-based projects: n European Data. Grid (EDG) : 2001 -2003 n LHC Computing GRID (LCG): 2002 -2008 -…. n Cross. Grid : 2002 -2005 www. crossgrid. org n Data. TAG : 2002 -2003 www. datatag. org n Grid. Lab : 2002 -2004 EGEE : 2004 -2007 ? www. edg. org cern. ch/lcg www. gridlab. org www. cern. ch/egee European National Projects: n INFNGRID, UK-Grid. PP, Nordu. Grid(Nordic test bed for wide area computing )… Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 14

Major existing Grid projects (2/2) u US projects: n Gri. Phy. N HEP www.

Major existing Grid projects (2/2) u US projects: n Gri. Phy. N HEP www. griphyn. org n PPDG HEP www. ppdg. net n i. VDGL ( joint Gri. Phy. N, PPDG) www. ivdgl. org n TERAGRID (NSF) s n www. teragrid. org IBM, Intel Qwest , Myricom, Sun Microsystems, Oracle. National Middleware Initiative (NSF NMI) middleware. org www. nsf- n ESG n NEESgrid virtual lab earthquake engineering www. neesgrid. org n BIRN biomedical informatics research network birn. ncrr. nih. gov/birn/ u Asia-based www. earthsystemgrid. org projects: n Ap. GRID www. apgrid. org n TWGRID www. twgrid. org n Many Grid projects in : Korea, Japan, China, Australia Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 15

Major US & European Grid Projects, many with strong HEP participation The Virtual Data

Major US & European Grid Projects, many with strong HEP participation The Virtual Data Toolkit (VDT) Many national, regional Grid projects -Grid. PP(UK), INFN-grid(I), Nordu. Grid, Dutch Grid, … US projects The Data. Grid Toolkit European projects Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 16

The European Data Grid Project u To build on the emerging Grid technology to

The European Data Grid Project u To build on the emerging Grid technology to develop a sustainable computing model for effective share of computing resources and data u Start : Jan 1, 2001 u Specific End : Dec 31, 2003 project objectives: n Middleware for fabric & Grid management (mostly funded by the EU) n Large scale testbed (mostly funded by the partners) n Production quality demonstrations (partially funded by the EU) u To collaborate with and complement other European and US projects u Contribute to Open Standards and international bodies: n Co-founder of Global Grid Forum and host of GGF 1 and GGF 3 n Industry and Research Forum for dissemination of project results Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 17

The EDG Main Partners Ø CERN – International (Switzerland/France) Ø CNRS - France Ø

The EDG Main Partners Ø CERN – International (Switzerland/France) Ø CNRS - France Ø ESA/ESRIN – International (Italy) Ø INFN - Italy Ø NIKHEF – The Netherlands Ø PPARC - UK Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 18

EDG Assistant Partners Industrial Partners • Datamat (Italy) • IBM-UK (UK) • CS-SI (France)

EDG Assistant Partners Industrial Partners • Datamat (Italy) • IBM-UK (UK) • CS-SI (France) Research and Academic Institutes • CESNET (Czech Republic) • Commissariat à l'énergie atomique (CEA) – France • Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) • Consiglio Nazionale delle Ricerche (Italy) • Helsinki Institute of Physics – Finland • Institut de Fisica d'Altes Energies (IFAE) - Spain • Istituto Trentino di Cultura (IRST) – Italy • Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany • Royal Netherlands Meteorological Institute (KNMI) • Ruprecht-Karls-Universität Heidelberg - Germany • Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands • Swedish Research Council - Sweden Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 19

EDG overview: Middleware release schedule Ø Ø Ø Release schedule n testbed 1: late

EDG overview: Middleware release schedule Ø Ø Ø Release schedule n testbed 1: late 2001 n testbed 2: early 2003 n testbed 3: end 2003 n Incremental releases between these major dates Each release includes n feedback on use of previous release by application groups n planned improvements/extension by middle-ware groups Application groups (HEP, EO, Bio-Info) are using existing software and testbed to explore how they can best exploit grids Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 20

Current Project Status Ø Ø EDG currently provides a set of middleware services Ø

Current Project Status Ø Ø EDG currently provides a set of middleware services Ø Job & Data Management Ø Grid & Network monitoring Ø Security, Authentication & Authorization tools Ø Fabric Management EDG release 2. 0 currently deployed to the EDG-Testbeds Ø GNU/Linux Red. Hat 7. 3 on Intel PCs ~15 sites in application testbed actively used by application groups Ø Ø Core sites CERN(CH), RAL(UK), NIKHEF(NL), CNAF(I), CC-Lyon(F) EDG sw also deployed at total of ~40 sites via Cross. Grid, Data. TAG and national grid projects Ø Final release 2. 1 will be out soon Ø Many applications ported to EDG testbeds and actively being used Ø Intense middleware development continuously going-on Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 21

Data. Grid in Numbers People Testbeds >350 registered users >15 regular sites 12 Virtual

Data. Grid in Numbers People Testbeds >350 registered users >15 regular sites 12 Virtual Organisations >10’ 000 s jobs submitted 16 Certificate Authorities >1000 CPUs >500 people trained 3 Mass Storage Systems 278 man-years of effort >5 Tera. Bytes disk 100 years funded Software 50 use cases 18 software releases >300 K lines of code Scientific applications 5 Earth Obs institutes 9 bio-informatics apps 6 HEP experiments Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 22

EDG structure : work packages Ø The EDG collaboration is structured in 12 Work

EDG structure : work packages Ø The EDG collaboration is structured in 12 Work Packages: n WP 1: Work Load Management System n WP 2: Data Management n WP 3: Grid Monitoring / Grid Information Systems n WP 4: Fabric Management n WP 5: Storage Element n WP 6: Testbed and demonstrators n WP 7: Network Monitoring n WP 8: High Energy Physics Applications n WP 9: Earth Observation n WP 10: Biology n WP 11: Dissemination n WP 12: Management } Applications Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 23

EDG Globus-based middleware architecture Ø Current EDG architectural functional blocks: n Basic Services (authentication,

EDG Globus-based middleware architecture Ø Current EDG architectural functional blocks: n Basic Services (authentication, authorization, Replica Catalog , secure file transfer, Info Providers) rely on Globus 2. 0 n Higher level EDG middleware (developed within EDG) n Applications (HEP, BIO, EO) Specific application layer VOs common application layer Grid middleware GLOBU S 2. 0 ALICE ATLAS CMS LHCb LHC Other apps High level Grid middleware Basic Services OS & Net services Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 24

EDG middleware Grid architecture Local Computing Local Application Local Database APPLICATIONS Grid Application Layer

EDG middleware Grid architecture Local Computing Local Application Local Database APPLICATIONS Grid Application Layer Data Management Job Management Metadata Management Collective Services Grid Scheduler Information & Monitoring Replica Manager Underlying Grid Services SQL Database Services Computing Element Services Storage Element Services Replica Catalog Authorization Authentication and Accounting Service Index M / W Grid Fabric services Resource Management Configuration Management Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management GLOBUS Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 25

EDG Interfaces Application Developers System Managers Local Database Scientists Certificate Authorities Grid Application Layer

EDG Interfaces Application Developers System Managers Local Database Scientists Certificate Authorities Grid Application Layer Data Management Job Management File Systems Local Application Metadata Management Object to File Mapping Collective Services User Accounts Information & Monitoring Replica Manager Grid Scheduler Underlying Grid Services SQL Database Services Computing Element Services Storage Element Services Replica Catalog Authorization Authentication and Accounting Service Index Fabric services Resource Management Configuration Management Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management Operating Systems Storage Mass Storage Systems Elements HPSS, Castor Batch Systems PBS, LSF, etc. Computing Elements Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 26

EDG Tutorial Overview Workload Management Services Data Management Services Networking Information Service Fabric Management

EDG Tutorial Overview Workload Management Services Data Management Services Networking Information Service Fabric Management Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 27

EDG : reference web sites u EDG n web site http: //www. edg. org

EDG : reference web sites u EDG n web site http: //www. edg. org u Source n http: //datagrid. in 2 p 3. fr u EDG n for all required software : testbed web site http: //marianne. in 2 p 3. fr u Dissemination n http: //web. datagrid. cnr. it/Gri. Dis. WP 1. html u EDG n users guide http: //marianne. in 2 p 3. fr/datagrid/documentation/EDG-Users. Guide. html u EDG n Testbed (Gri. Dis) tutorials web site http: //cern. ch/edg-tutorials Grid Tutorial - 2/28/2021 – Data. Grid Introduction - n° 28