What are Grids and eScience David Fergusson EGEE

  • Slides: 37
Download presentation
What are Grids and e-Science? David Fergusson EGEE is funded by the European Union

What are Grids and e-Science? David Fergusson EGEE is funded by the European Union under contract IST-2003 -508833 Induction: What are Grids and e-Science? –May 18 th, 2004 - 1

Acknowledgements • This talk is based on a module of the tutorials delivered by

Acknowledgements • This talk is based on a module of the tutorials delivered by the EDG training team and slides from • • Andrew Grimshaw, University of Virginia Bob Jones, EGEE Technical Director Mark Parsons, EPCC the EDG training team Roberto Barbera, INFN Ian Foster, Argonne National Laboratories Jeffrey Grethe, SDSC The National e-Science Centre • Prepared by Dave Berry Induction: What are Grids and e-Science? –May 18 th, 2004 - 2

Goals of this module • Introduce grid concepts and definitions • Why Grids? •

Goals of this module • Introduce grid concepts and definitions • Why Grids? • A brief outline of history leading to EGEE • Provide some brief examples of middleware components • The strategic direction will be covered tomorrow Induction: What are Grids and e-Science? –May 18 th, 2004 - 3

Overview • What is different about grids? • Characteristics of a grid • e.

Overview • What is different about grids? • Characteristics of a grid • e. Science • Applications (what’s in it for the working scientist) • European grids, and the world • Grid components. Induction: What are Grids and e-Science? –May 18 th, 2004 - 4

What is different about grids? Induction: What are Grids and e-Science? –May 18 th,

What is different about grids? Induction: What are Grids and e-Science? –May 18 th, 2004 - 5

What is Grid Computing? • A Virtual Organisation is: People from different institutions working

What is Grid Computing? • A Virtual Organisation is: People from different institutions working to solve a common goal • Sharing distributed processing and data resources • • Grid infrastructure enables virtual organisations “Grid computing is coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations” (I. Foster) Induction: What are Grids and e-Science? –May 18 th, 2004 - 6

Grids vs. Distributed Computing • Distributed applications already exist, but they tend to be

Grids vs. Distributed Computing • Distributed applications already exist, but they tend to be specialised systems intended for a single • purpose or user group • Grids go further and take into account: • Different kinds of resources • Not always the same hardware, data and applications • Different kinds of interactions • User groups or applications want to interact with Grids in different ways • Dynamic nature • Resources and users added/removed/changed frequently Induction: What are Grids and e-Science? –May 18 th, 2004 - 7

Characteristics of a grid Induction: What are Grids and e-Science? –May 18 th, 2004

Characteristics of a grid Induction: What are Grids and e-Science? –May 18 th, 2004 - 8

What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level

What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous Induction: What are Grids and e-Science? –May 18 th, 2004 - 9

What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level

What are the characteristics of a Grid system? Numerous Resources Connected by Heterogeneous, Multi-Level Networks Ownership by Mutually Distrustful Organizations & Individuals Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous Induction: What are Grids and e-Science? –May 18 th, 2004 -

How Different 2004 is from 1994 • Moore’s law everywhere Instruments, detectors, sensors, scanners,

How Different 2004 is from 1994 • Moore’s law everywhere Instruments, detectors, sensors, scanners, … • Organising their effective use is the challenge • • Enormous quantities of data: Petabytes For an increasing number of communities • Gating step is not collection but analysis • • Huge quantities of computing: >100 Top/s Moore’s law gives us all supercomputers • Organising their effective use is the challenge • • Ultra-high-speed networks: >10 Gb/s Global optical networks • Bottlenecks: last kilometre & firewalls • Induction: What are Grids and e-Science? –May 18 th, 2004 -

Exponential Growth Optical Fibre Performance per Dollar Spent Doubling Time 9 12 Gilder’s Law

Exponential Growth Optical Fibre Performance per Dollar Spent Doubling Time 9 12 Gilder’s Law (32 X in 4 yrs) (bits per second) (months) 18 Data Storage Law (16 X in 4 yrs) (bits per sq. inch) Chip capacity (# transistors) 0 1 2 Moore’s Law (5 X in 4 yrs) 3 4 5 Number of Years Triumph of Light – Scientific American. George Stix, January 2001 Induction: What are Grids and e-Science? –May 18 th, 2004 -

The main drivers behind Grid • The relentless increase in microprocessor performance • you

The main drivers behind Grid • The relentless increase in microprocessor performance • you can buy multi-gigaflop systems for less than € 800 • The availability of reliable high performance networking in Europe the GEANT network links 32 countries at speeds of up to 10 Gbps (and beyond) • in the UK we have gone from 100 Mbps -> 10 Gbps academic backbone since 2000 • 1 Gbps is commonly available to the desktop • • The desire to push the boundaries of scientific discovery by computational analysis and simulation – e-Science Induction: What are Grids and e-Science? –May 18 th, 2004 -

e. Science Induction: What are Grids and e-Science? –May 18 th, 2004 -

e. Science Induction: What are Grids and e-Science? –May 18 th, 2004 -

The Emergence of e-Science • Invention and exploitation of advanced computational methods • To

The Emergence of e-Science • Invention and exploitation of advanced computational methods • To generate, curate and analyse research data • From experiments, observations and simulations • Quality management, preservation and reliable evidence • To develop and explore models and simulations • Computation and data at extreme scales • Trustworthy, economic, timely and relevant results • To enable dynamic distributed virtual organisations • Facilitating collaboration with information and resource sharing • Security, reliability, accountability, manageability and agility Induction: What are Grids and e-Science? –May 18 th, 2004 -

Why use Grids for Science? • Scale of the problems • Science increasingly done

Why use Grids for Science? • Scale of the problems • Science increasingly done through distributed global collaborations enabled by the internet • Grids provide access to: Very large data collections • Terascale computing resources • High performance visualisation • Connected by high-bandwidth networks • • e-Science is more than Grid Technology It is what you do with it that counts Induction: What are Grids and e-Science? –May 18 th, 2004 -

Challenges • Must share data between thousands of scientists with multiple interests • Must

Challenges • Must share data between thousands of scientists with multiple interests • Must ensure that all data is accessible anywhere, anytime • Must be scalable and remain reliable for more than a decade • Must cope with different access policies • Must ensure data security Induction: What are Grids and e-Science? –May 18 th, 2004 -

The Grid Vision Researchers perform their activities regardless geographical location, interact with colleagues, share

The Grid Vision Researchers perform their activities regardless geographical location, interact with colleagues, share and access data The Grid: networked data processing centres and ”middleware” software as the “glue” of resources. Scientific instruments and experiments provide huge amount of data Induction: What are Grids and e-Science? –May 18 th, 2004 -

The Emergence of Global Knowledge Communities Slide from Ian Foster’s ssdbm 03 keynote

The Emergence of Global Knowledge Communities Slide from Ian Foster’s ssdbm 03 keynote

Applications (What’s in it for working scientists) Induction: What are Grids and e-Science? –May

Applications (What’s in it for working scientists) Induction: What are Grids and e-Science? –May 18 th, 2004 -

Grid Applications • Medical/Healthcare (imaging, diagnosis and treatment ) • Bioinformatics (study of the

Grid Applications • Medical/Healthcare (imaging, diagnosis and treatment ) • Bioinformatics (study of the human genome and proteome to understand genetic diseases) • Nanotechnology (design of new materials from the molecular scale) • Engineering (design optimization, simulation, failure analysis and remote Instrument access and control) • Natural Resources and the Environment (weather forecasting, earth observation, modeling and prediction of complex systems) Induction: What are Grids and e-Science? –May 18 th, 2004 -

CERN: Data intensive science in a large international facility • The Large Hadron Collider

CERN: Data intensive science in a large international facility • The Large Hadron Collider (LHC) • The most powerful instrument ever built to investigate elementary particles physics • Data Challenge: 10 Petabytes/year of data !!! • 20 million CDs each year! • Mont Blanc (4810 m) • Simulation, reconstruction, analysis: • LHC data handling requires computing power equivalent to ~100, 000 of today's fastest PC processors! Downtown Geneva Induction: What are Grids and e-Science? –May 18 th, 2004 -

Cross. Grid • 1. Interactive biomedical simulation and visualization • 2. Flooding crisis team

Cross. Grid • 1. Interactive biomedical simulation and visualization • 2. Flooding crisis team support • 3. HEP distributed data analysis • 4. Weather forecasting and air pollution modelling Induction: What are Grids and e-Science? –May 18 th, 2004 -

Connecting People: Access Grid Remote video Visualisation Microphones Cameras Induction: What are Grids and

Connecting People: Access Grid Remote video Visualisation Microphones Cameras Induction: What are Grids and e-Science? –May 18 th, 2004 -

European grids And the world Induction: What are Grids and e-Science? –May 18 th,

European grids And the world Induction: What are Grids and e-Science? –May 18 th, 2004 -

Major EU GRID projects European Data. Grid (EDG) www. edg. org LHC Computing GRID

Major EU GRID projects European Data. Grid (EDG) www. edg. org LHC Computing GRID (LCG) cern. ch/lcg Cross. GRID www. crossgrid. org Data. TAG www. datatag. org Grid. Lab www. gridlab. org EUROGRID www. eurogrid. org European National Projects: • INFNGRID, • UK e-Science Programme, • Nordu. Grid Induction: What are Grids and e-Science? –May 18 th, 2004 -

EU Data. Grid at a glance People Application Testbed ~20 regular sites 500 registered

EU Data. Grid at a glance People Application Testbed ~20 regular sites 500 registered users > 60, 000 jobs submitted (since 09/03, release 2. 0) 12 Virtual Organisations Peak >1000 CPUs 21 Certificate Authorities 6 Mass Storage Systems >600 people trained 456 person-years of effort 170 years funded Software > 65 use cases 7 major software releases (> 60 in total) > 1, 000 lines of code Scientific Applications 5 Earth Obs institutes 10 bio-medical apps 6 HEP experiments Induction: What are Grids and e-Science? –May 18 th, 2004 -

Grid projects Many Grid development efforts — all over the world • UK –

Grid projects Many Grid development efforts — all over the world • UK – OGSA-DAI, Reality. Grid, Geo. Dise, • NASA Information Power Grid Comb-e-Chem, Discovery. Net, DAME, • DOE Science Grid Astro. Grid, Grid. PP, My. Grid, GOLD, e. Diamond, Integrative Biology, … • NSF National Virtual Observatory • Netherlands – VLAM, Polder. Grid • NSF Gri. Phy. N • Germany – UNICORE, Grid proposal • DOE Particle Physics Data Grid • France – Grid funding approved • NSF Tera. Grid • Italy – INFN Grid • DOE ASCI Grid • Eire – Grid proposals • DOE Earth Systems Grid • Switzerland - Network/Grid proposal • DARPA Co. ABS Grid • Data. Grid (CERN, . . . ) • Hungary – Demo. Grid, Grid proposal • NEESGrid • Euro. Grid (Unicore) • Norway, Sweden - Nordu. Grid • Data. Tag (CERN, …) • DOH BIRN • Astrophysical Virtual Observatory • NSF i. VDGL • GRIP (Globus/Unicore) • GRIA (Industrial applications) • Grid. Lab (Cactus Toolkit) • Cross. Grid (Infrastructure Components) • EGSO (Solar Physics) Induction: What are Grids and e-Science? –May 18 th, 2004 -

Grid components Induction: What are Grids and e-Science? –May 18 th, 2004 -

Grid components Induction: What are Grids and e-Science? –May 18 th, 2004 -

Virtual Data Toolkit • Grid Middleware components from several projects Packaged and tested together

Virtual Data Toolkit • Grid Middleware components from several projects Packaged and tested together • Foundation of EGEE/ LCG • • Globus Toolkit Condor Chimera EDG & LCG tools NCSA Tools Other Tools Induction: What are Grids and e-Science? –May 18 th, 2004 -

Globus Toolkit • Grid Security Infrastructure (GSL) • X. 509 authentication with delegates and

Globus Toolkit • Grid Security Infrastructure (GSL) • X. 509 authentication with delegates and single sign-on • Grid Resource Allocation Mgmt (GRAM) • Remote allocation, reservation, monitoring, control of compute resources • Grid. FTP protocol (FTP extensions) • High-performance data access & transport • Grid Resource Information Service (GRIS) + Monitoring and Discovery Service (MDS) • Access to structure & state information • XIO • TCP, UDP, IP multicast, and file I/O • Others… Induction: What are Grids and e-Science? –May 18 th, 2004 -

Condor • “Cycle-stealing” • Use idle CPU cycles for productive work • “High Throughput

Condor • “Cycle-stealing” • Use idle CPU cycles for productive work • “High Throughput Computing” Using all available compute power over periods of days, weeks, … • “Embarrassingly parallel” problems • • Fault tolerance Algorithms must allow for failure • Checkpointing and process migration • • DAGMan • Workflow specification Induction: What are Grids and e-Science? –May 18 th, 2004 -

Chimera • Technology for collaborative management of data, programs & computations • Virtual data

Chimera • Technology for collaborative management of data, programs & computations • Virtual data system Virtual data catalog • Virtual data language • Automated data derivation • Provenance tracking • • Pegasus • AI planning system for Grid workflows Induction: What are Grids and e-Science? –May 18 th, 2004 -

Tools • NCSA My. Proxy • GSI Open. SSH • • EDG & LCG

Tools • NCSA My. Proxy • GSI Open. SSH • • EDG & LCG Make Gridmap (Authorisation control) • Certificate Revocation List Updater • GLUE Schema (Monitoring) • • Others VDT System Profiler • Configuration software • KX 509 (X. 509 <-> Kerberos) • Induction: What are Grids and e-Science? –May 18 th, 2004 -

Summary Internet Induction: What are Grids and e-Science? –May 18 th, 2004 -

Summary Internet Induction: What are Grids and e-Science? –May 18 th, 2004 -

Questions? Induction: What are Grids and e-Science? –May 18 th, 2004 -

Questions? Induction: What are Grids and e-Science? –May 18 th, 2004 -

Virtual Data Language Chimera Abstract Worfklow Request Manager Workflow Planning Replica Loc a tio

Virtual Data Language Chimera Abstract Worfklow Request Manager Workflow Planning Replica Loc a tio n Available Re ource s Data Management Workflow Reduction n m at io in fo r Concrete W orkflow Globus Monitoring and Discovery Service Application Models Grid M on ito ri n g workflow executor (DAGman) Execution Data Publication Dynamic information Submission and Monitoring System Replica and Resource Selector Globus Replica Location Service Information and Models s sk ta Grid Raw data detector Induction: What are Grids and e-Science? –May 18 th, 2004 -