Grid Computing at NIC Achim Streit Team a

  • Slides: 32
Download presentation
Grid Computing at NIC Achim Streit + Team a. streit@fz-juelich. de 4 September 2005

Grid Computing at NIC Achim Streit + Team a. streit@fz-juelich. de 4 September 2005 Forschungszentrum Jülich in der Helmholtz-Gesellschaft

Grid Projects at FZJ 4 UNICORE 08/1997 -12/1999 4 UNICORE Plus 01/2000 -12/2002 4

Grid Projects at FZJ 4 UNICORE 08/1997 -12/1999 4 UNICORE Plus 01/2000 -12/2002 4 EUROGRID 11/2000 -01/2004 4 GRIP 01/2002 -02/2004 4 Open. Mol. GRID 09/2002 -02/2005 4 VIOLA 05/2004 -04/2007 4 DEISA 05/2004 -04/2009 4 Uni. Grids 07/2004 -06/2006 4 Next. Grid 09/2004 -08/2007 4 Core. GRID 09/2004 -08/2008 4 D-Grid 09/2005 -02/2008 2 Forschungszentrum Jülich

4 a vertically integrated Grid middleware system 4 provides seamless, secure, and intuitive access

4 a vertically integrated Grid middleware system 4 provides seamless, secure, and intuitive access to distributed resources and data 4 used in production and projects worldwide 4 features 4 intuitive GUI with single sign-on, X. 509 certificates for AA and job/data signing, only one opened port in firewall required, workflow engine for complex multi-site/multi-step workflows, job monitoring extensible application support with plug-ins, production quality, matured job monitoring, interactive access with UNICORE-SSH, integrated secure data transfer, resource management, full control of resources remains, production quality, . . . 3 Forschungszentrum Jülich

Architecture 4 Workflow-Engine Client 4 Resource Management 4 Job-Monitoring SSL Multi-Site Jobs 4 File

Architecture 4 Workflow-Engine Client 4 Resource Management 4 Job-Monitoring SSL Multi-Site Jobs 4 File Transfer opt. Firewall 4 User Management Gateway Authentication 4 Application Support opt. Firewall NJS Authorization NJS Abstract NJS UUDB IDB Authorization UUDB Incarnation IDB 4 similar to /etc/grid-security/grid-mapfile TSI RMS Disc Vsite Non-Abstract TSI 4 similar to TSI Globus jobmanager 4 fork 4 Load. Leveler, (Open)PBS(Pro), RMS Disc CCS, LSF, NQE/NQS, . . . Vsite Usite 4 CONDOR, GT 2. 4 Vsite 4 Usite Forschungszentrum Jülich

UNICORE Client 5 Forschungszentrum Jülich

UNICORE Client 5 Forschungszentrum Jülich

UNICORE-SSH 4 uses standard UNICORE security mechanisms to open a SSH connection through the

UNICORE-SSH 4 uses standard UNICORE security mechanisms to open a SSH connection through the standard SSH port UNICORE-SSH button 6 Forschungszentrum Jülich

Workflow Automation & Speed-up 4 Automate, integrate, and speed-up drug discovery in pharmaceutical industry

Workflow Automation & Speed-up 4 Automate, integrate, and speed-up drug discovery in pharmaceutical industry 4 University of Ulster: 4 4 Data Warehouse University of Tartu: Compute Resources FZ Jülich: Grid Middleware Com. Genex Inc. : EPA Data, User ECOTOX Database Instituto di Ricerche Farmacologiche “Mario Negri”: User QSAR Descriptors 3 D Output > 5 days 280 2 D structures downloaded < 2 hours QSAR 3 D Output 7 Descriptors Forschungszentrum Jülich

Workflow Automation & Speed-up automatic split-up of data-parallel task 8 Forschungszentrum Jülich

Workflow Automation & Speed-up automatic split-up of data-parallel task 8 Forschungszentrum Jülich

@ 4 Open Source under BSD license 4 Supported by FZJ 4 Integration of

@ 4 Open Source under BSD license 4 Supported by FZJ 4 Integration of own results and from other projects 4 Release Management 4 Problem tracking 4 CVS, Mailing Lists 4 Documentation 4 Assistance 4 Viable basis for many projects 4 DEISA, VIOLA, Uni. Grids, D-Grid, Na. Re. GI 4 http: //unicore. sourceforge. net 9 Forschungszentrum Jülich

From Testbed to Production 2002 2005 4 Different communities 4 Different computing resources (super

From Testbed to Production 2002 2005 4 Different communities 4 Different computing resources (super computers, clusters, …) 4 Know-how in Grid middleware Success factor: vertical integration 10 Forschungszentrum Jülich

Production 4 National high-performance computing centre “John von Neumann Institute for Computing” 4 About

Production 4 National high-performance computing centre “John von Neumann Institute for Computing” 4 About 650 users in 150 research projects 4 Access via UNICORE to 4 IBM p 690 e. Series Cluster (1312 CPUs, 8. 9 TFlops) 4 IBM Blue. Gene/L (2048 CPUs, 5. 7 TFlops) 4 Cray XD 1 (72+ CPUs) 4 116 active UNICORE users 4 72 external, 44 internal 4 Resource usage (CPU-hours) 4 Dec 18. 4%, Jan 30. 4%, Feb 30. 5%, Mar 27. 1%, Apr 29. 7%, May 39. 1%, Jun 22. 3%, Jul 20. 2%, Aug 29. 0% 11 Forschungszentrum Jülich

Grid Interoperability UNICORE – Globus Toolkit Uniform Interface to Grid Services OGSA-based UNICORE/GS WSRF-Interoperability

Grid Interoperability UNICORE – Globus Toolkit Uniform Interface to Grid Services OGSA-based UNICORE/GS WSRF-Interoperability 12 Forschungszentrum Jülich

Architecture: UNICORE jobs on GLOBUS resources Client NJS Gateway IDB UNICORE UUDB TSI GRAM

Architecture: UNICORE jobs on GLOBUS resources Client NJS Gateway IDB UNICORE UUDB TSI GRAM Client Grid. FTP Client Uspace Globus 2 MDS GRAM Gatekeeper GRAM Job-Manager Grid. FTP Server RMS 13 Forschungszentrum Jülich

Consortium Research Center Jülich (project manager) Consorzio Interuniversitario per il Calcolo Automatico dell’Italia Nord

Consortium Research Center Jülich (project manager) Consorzio Interuniversitario per il Calcolo Automatico dell’Italia Nord Orientale Fujitsu Laboratories of Europe Intel Gmb. H University of Warsaw University of Manchester T-Systems Sf. R Funded by EU grant: IST-2002 -004279

Web Services Unicore/GS Architecture Unicore Component New Component Web Services Interface Resource Broker Resource

Web Services Unicore/GS Architecture Unicore Component New Component Web Services Interface Resource Broker Resource Database TSI Unicore Client OGSA Client Unicore Gateway Network Job Supervisor OGSA Server A Uni. Grids Portal OGSA Server B User Database Access Unicore Components as Web Services Integrate Web Services into the Unicore workflow

Atomic Services UNICORE basic functions Site Management (TSF/TSS) TSF ♦ Compute Resource Factory ♦

Atomic Services UNICORE basic functions Site Management (TSF/TSS) TSF ♦ Compute Resource Factory ♦ Submit, Resource Information Job Management (JMS) ♦ Start, Hold, Abort, Resume Storage Management (SMS) ♦ List directory, Copy, Make directory, Rename, Remove File Transfer (FTS) ♦ File import, file export TSS SMS WSRF JMS FTS WSRF Standardization JSDL WG Revitalized by Uni. Grids and NAREGI Atomic Services are input to the OGSA-BES WG

Three levels of interoperability Level 1: Interoperability between WSRF services UNICORE/GS passed the official

Three levels of interoperability Level 1: Interoperability between WSRF services UNICORE/GS passed the official WSRF interop test GPE and JOGSA hosting environments succesfully tested against UNICORE/GS and other endpoints WSRF specification will be finalized soon! ♦ Currently: UNICORE/GS: WSRF 1. 3, GTK: WSRF 1. 2 draft 1 Atomic Services Advanced services GTK 4 CGSP GPE-Workflow UNICORE/GS WSRF Service API GPE-Registry UNICORE/GS GTK 4 WSRF Hosting Environment UNICORE/GSHE GTK 4 -HE GPE Uo. M-Broker JOGSA-HE

Three levels of interoperability Clients Higherlevel services Portal Apps Expert Visit Atomic Service Client

Three levels of interoperability Clients Higherlevel services Portal Apps Expert Visit Atomic Service Client API GPE Atomic Services Level 2: Interoperability between atomic service implementations Client API hides details about WSRF hosting environment Client code will work with different WSRF implementations and WSRF versions if different stubs are being used at the moment! Advanced services GTK 4 CGSP GPE-Workflow UNICORE/GS WSRF Service API GPE-Registry UNICORE/GS GTK 4 WSRF Hosting Environment UNICORE/GSHE GTK 4 -HE GPE Uo. M-Broker JOGSA-HE

Three levels of interoperability Grid. Beans Gaussian PDBSearch CPMD POVRay Compiler Grid. Bean API

Three levels of interoperability Grid. Beans Gaussian PDBSearch CPMD POVRay Compiler Grid. Bean API Clients GPE Higherlevel services Portal Apps Expert Level 3: Grid. Beans working on top of different Client implementations Independent of atomic service implementations Independent of specification versions being used Grid. Bean run on GTK or UNICORE/GS without modifications Grid. Beans survive version changes in the underlying layers and are easy to maintain Visit Atomic Service Client API GPE Atomic Services Advanced services GTK 4 CGSP GPE-Workflow UNICORE/GS WSRF Service API GPE-Registry UNICORE/GS GTK 4 WSRF Hosting Environment UNICORE/GSHE GTK 4 -HE GPE Uo. M-Broker JOGSA-HE

Consortium DEISA is a consortium of leading national supercomputer centers in Europe IDRIS –

Consortium DEISA is a consortium of leading national supercomputer centers in Europe IDRIS – CNRS, France FZJ, Jülich, Germany RZG, Garching, Germany CINECA, Bologna, Italy EPCC, Edinburgh, UK CSC, Helsinki, Finland SARA, Amsterdam, The Netherlands HLRS, Stuttgart, Germany BSC, Barcelona, Spain LRZ, Munich, Germany ECMWF (European Organization), Reading, UK Granted by: European Union FP 6 Grant period: May, 1 st 2004 – April, 30 th 2008 20 Forschungszentrum Jülich

DEISA objectives l To enable Europe’s terascale science by the integration of Europe’s most

DEISA objectives l To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems. l Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success l DEISA is an European Supercomputing Service built on top of existing national services. l DEISA deploys and operates a persistent, production quality, distributed, heterogeneous supercomputing environment with continental scope. 21 Forschungszentrum Jülich

Basic requirements and strategies for the DEISA research Infrastructure l Fast deployment of a

Basic requirements and strategies for the DEISA research Infrastructure l Fast deployment of a persistent, production quality, grid empowered supercomputing infrastructure with continental scope. l European supercomputing service built on top of existing national services requires reliability and non disruptive behavior. l User and application transparency l Top-down approach: technology choices result from the business and operational models of our virtual organization. DEISA technology choices are fully open. 22 Forschungszentrum Jülich

The DEISA supercomputing Grid: A layered infrastructure l Inner layer: a distributed super-cluster resulting

The DEISA supercomputing Grid: A layered infrastructure l Inner layer: a distributed super-cluster resulting from the deep integration of similar IBM AIX platforms at IDRIS, FZ-Jülich, RZGGarching and CINECA (phase 1) then CSC (phase 2). It looks to external users as a single supercomputing platform. l Outer layer: a heterogeneous supercomputing Grid: IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA, CSC) close to 24 Tf BSC, IBM Power. PC Linux system, 40 Tf LRZ, Linux cluster (2. 7 Tf) moving to SGI ALTIX system (33 Tf in 2006, 70 Tf in 2007) SARA, SGI ALTIX Linux cluster, 2. 2 Tf ECMWF, IBM AIX system, 32 Tf HLRS, NEC SX 8 vector system, close to 10 Tf 23 Forschungszentrum Jülich

Logical view of the phase 2 DEISA network FUnet SURFnet DFN GÈANT RENATER UKERNA

Logical view of the phase 2 DEISA network FUnet SURFnet DFN GÈANT RENATER UKERNA Red. IRIS GARR 24 Forschungszentrum Jülich

AIX Super-Cluster May 2005 Services: Services CSC High performance datagrid via GPFS Access to

AIX Super-Cluster May 2005 Services: Services CSC High performance datagrid via GPFS Access to remote files use the full available network bandwidth ECMWF Job migration across sites Used to load balance the global workflow when a huge partition is allocated to a DEISA project in one site Common Production Environment 25 Forschungszentrum Jülich

Service Activities l SA 1 – Network Operation and Support (FZJ) l SA 2

Service Activities l SA 1 – Network Operation and Support (FZJ) l SA 2 – Data Management with Global File Systems (RZG) l Deployment and operation of global scheduling services for the European super-cluster, as well as for its heterogeneous Grid extension. SA 4 – Applications and User Support (IDRIS) l Deployment and operation of global distributed file systems, as basic building blocks of the “inner” super-cluster, and as a way of implementing global data management in a heterogeneous Grid. SA 3 – Resource Management (CINECA) l Deployment and operation of a gigabit per second network infrastructure for an European distributed supercomputing platform. Network operation and optimization during project activity. Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science. SA 5 – Security (SARA) Providing administration, authorization and authentication for a heterogeneous cluster of HPC systems, with special emphasis on single sign-on. 26 Forschungszentrum Jülich

DEISA Supercomputing Grid services l l Workflow management: based on UNICORE plus further extensions

DEISA Supercomputing Grid services l l Workflow management: based on UNICORE plus further extensions and services coming from DEISA’s JRA 7 and other projects (Uni. Grids, …) Global data management: a well defined architecture implementing extended global file systems on heterogeneous systems, fast data transfers across sites, and hierarchical data management at a continental scale. Co-scheduling: needed to support Grid applications running on the heterogeneous environment. Science Gateways and portals: specific Internet interfaces to hide complex supercomputing environments from end users, and facilitate the access of new, non traditional, scientific communities. 27 Forschungszentrum Jülich

Workflow Application with UNICORE Global Data Management with GPFS Job-workflow: 1) FZJ 2) CINECA

Workflow Application with UNICORE Global Data Management with GPFS Job-workflow: 1) FZJ 2) CINECA 3) RZG 4) IDRIS 5) SARA Client Job CPU GPFS CPU Data GPFS CPU GPFS + NRENs 28 Forschungszentrum Jülich

Workflow Application with UNICORE Global Data Management with GPFS 29 Forschungszentrum Jülich

Workflow Application with UNICORE Global Data Management with GPFS 29 Forschungszentrum Jülich

Usage in other Projects 4 UNICORE as basic middleware for research and development 4

Usage in other Projects 4 UNICORE as basic middleware for research and development 4 Development of UNICONDORE interoperability layer (UNICORE CONDOR) 4 Access to about 3000 CPUs with approx. 17 TFlops peak in the Na. Re. GI testbed Integration Project 4 UNICORE is used in the Core-D-Grid Infrastructure 4 Development of tools for (even) easier installation and configuration of client and server components 30 Forschungszentrum Jülich

Summary 4 establishes a seamless access to Grid resources and data 4 designed as

Summary 4 establishes a seamless access to Grid resources and data 4 designed as a vertically integrated Grid Middleware 4 provides matured workflow capabilities 4 used in production at NIC and in the DEISA infrastructure 4 available as Open Source from http: //unicore. sourceforge. net 4 used in research projects worldwide 4 continuously enhanced by an international expert team of Grid developers 4 currently transformed in the Web Services world towards OGSA and WSRF compliance 31 Forschungszentrum Jülich

October 11– 12, 2005 ETSI Headquarters, Sophia Antipolis, France http: //summit. unicore. org/2005 In

October 11– 12, 2005 ETSI Headquarters, Sophia Antipolis, France http: //summit. unicore. org/2005 In conjunction with Grids@work: Middleware, Components, Users, Contest and Plugtests http: //www. etsi. org/plugtests/GRID. htm Supported by 32 Forschungszentrum Jülich