Introduction to Grid Computing Introduction to Grid Computing

  • Slides: 59
Download presentation
Introduction to Grid Computing

Introduction to Grid Computing

Introduction to Grid Computing § The term Grid comes from an analogy to the

Introduction to Grid Computing § The term Grid comes from an analogy to the Electric Grid. – Pervasive access to power. – Similarly, Grid will provide pervasive, consistent, and inexpensive access to advanced computational resources. § Grid computing is all about achieving greater performance and throughput by pooling resources on a local, national, or international level.

Scalable Computing P E R F O R M A N C E 2100

Scalable Computing P E R F O R M A N C E 2100 2100 2100 Administrative Barriers + • Individual • Group • Department • Campus • State • National • Globe Q o S Personal Device SMPs or Super. Computers Local Cluster Enterprise Cluster/Grid Global Grid Inter Planet Grid

GRID Computing § Grids are about large-scale resource sharing. – Spanning administrative boundaries. §

GRID Computing § Grids are about large-scale resource sharing. – Spanning administrative boundaries. § Central processors, storage, network bandwidth, databases, applications, sensors and so on § Problem solving in dynamic, multi-institutional environment. § Organizing geographically distributed computing resources – So that they can be flexibly and dynamically allocated and accessed § Providing such capabilities, where Sharing is highly controlled, clear definitions of exactly what is shared, who is allowed to share, and the conditions under which sharing occurs.

Elements of Grid Computing § Resource sharing – Computers, data, storage, sensors, networks, …

Elements of Grid Computing § Resource sharing – Computers, data, storage, sensors, networks, … – Sharing always conditional: issues of trust, policy, negotiation, payment, … § Coordinated problem solving – Beyond client-server: distributed data analysis, computation, collaboration, … § Dynamic, multi-institutional virtual organizations – Community overlays on classic org structures – Large or small, static or dynamic

Virtual Organizations § A set of individuals and/or institutions defined by a set of

Virtual Organizations § A set of individuals and/or institutions defined by a set of sharing rules § The sharing is highly controlled, with resource providers and consumers defining clearly and carefully just what is shared An example: the set of application service providers, storage service providers, cycle providers and consultants engaged by a car manufacturer to plan for a new factory Another example: industrial consortium building a new aircraft

More Formal Definition of Grids § A grid is a system that: – Coordinates

More Formal Definition of Grids § A grid is a system that: – Coordinates resource sharing in a de-centralized manner (i. e. , different VOs). – Uses standard, open, general purpose protocols and interfaces. – Delivers non-trivial qualities of service. § Guaranteed bandwidth for application. § Guaranteed CPU cycles. § Guaranteed latency.

Computational Grid Applications § § Biomedical research Industrial research Engineering research Studies in Physics

Computational Grid Applications § § Biomedical research Industrial research Engineering research Studies in Physics and Chemistry

Science Today is a Team Sport!! I. Foster

Science Today is a Team Sport!! I. Foster

e. Science [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale

e. Science [n]: Large-scale science carried out through distributed collaborations—often leveraging access to large-scale data & computing I. Foster

Tera. Grid is an Important Project developed by the National Science Foundation (NSF). Slide

Tera. Grid is an Important Project developed by the National Science Foundation (NSF). Slide obtained from B. Wilkinson, http: //sol. cs. wcu. edu/~abw/CS 493 F 04/

Tera. Grid Slide obtained from B. Wilkinson, http: //sol. cs. wcu. edu/~abw/CS 493 F

Tera. Grid Slide obtained from B. Wilkinson, http: //sol. cs. wcu. edu/~abw/CS 493 F 04/

UK e-Science Grid Slide obtained from B. Wilkinson, http: //sol. cs. wcu. edu/~abw/CS 493

UK e-Science Grid Slide obtained from B. Wilkinson, http: //sol. cs. wcu. edu/~abw/CS 493 F 04/

Applications § National Virtual Observatory – Astronomical surveys produce terabytes of data. – Data

Applications § National Virtual Observatory – Astronomical surveys produce terabytes of data. – Data sets will cover sky in different wave bands (x-rays, optical, infrared, radio). – Challenge is to make this accessible to general research community. § Heterogeneous data producers and consumers. – Resources in this Grid are data sets rather than compute engines.

High-Energy Physics § Large-scale collaborations for CERN’s Large Hadron Collider. § Involves 4000 physicists,

High-Energy Physics § Large-scale collaborations for CERN’s Large Hadron Collider. § Involves 4000 physicists, 150 institutions, in more than 30 countries. § Data sets now at petabyte level. Predicted to generate data at the exabyte level in this decade. § Challenges: – Providing rapid access to subsets of data. – Secure access to distributed computing and data handling resources.

§ Essentially, provide a distributed collaborative infrastructure that will allow physicist from around the

§ Essentially, provide a distributed collaborative infrastructure that will allow physicist from around the globe to effectively analyze results from their home institution.

Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination real-time collection archival storage

Online Access to Scientific Instruments Advanced Photon Source wide-area dissemination real-time collection archival storage desktop & VR clients with shared controls tomographic reconstruction DOE X-ray grand challenge: ANL, USC/ISI, NIST, U. Chicago

NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research

NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes I. Foster

NEES § network of 15 large-scale, experimental sites § advanced tools such as shake

NEES § network of 15 large-scale, experimental sites § advanced tools such as shake tables, centrifuges that simulate earthquake effects, unique laboratories, a tsunami wave basin and fieldtesting equipment. § linked to a centralized data pool and earthquake simulation software, bridged together by the highspeed Internet 2. § off-site researchers to interact in real time with any of the networked sites.

§ Securely store, organize, and share data within a standardized framework in a central

§ Securely store, organize, and share data within a standardized framework in a central location. § Remotely observe and participate in experiments through the use of synchronized real-time data and video. § Collaborate with colleagues to facilitate the planning, performance, analysis, and publication of research experiments. § Conduct hybrid simulations that combine the results of multiple distributed experiments and link physical experiments with computer simulations.

DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of

DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models www. earthsystemgrid. org I. Foster

Earth System Grid I. Foster

Earth System Grid I. Foster

§ High-resolution, long-duration simulations performed with advanced DOE climate models produce tens of petabytes

§ High-resolution, long-duration simulations performed with advanced DOE climate models produce tens of petabytes of output. § This output made available to global change impacts researchers nationwide, both at national laboratories and at universities, other research laboratories, and other institutions. § a virtual collaborative environment that links distributed centers, users, models, and data. § provides scientists with virtual proximity to the distributed data and resources that they require to perform their research.

Lets Play Virtual Organization! § The members of this class represent a VO within

Lets Play Virtual Organization! § The members of this class represent a VO within the university. § The resources of the VO include: – The laptops, workstations, and printers belonging to the individuals of the VO (that’s you guys 1!). – Does this bring up any issues worth concerning yourself about? 1. I do not join virtual organizations

§ Want to tightly control who may use these resources and how they may

§ Want to tightly control who may use these resources and how they may be used. Thus need security.

§ Security: – Want to tightly control who may use these resources and how

§ Security: – Want to tightly control who may use these resources and how they may be used. § How about Larry and Ramm wanting to use your printer at the same time (which happens to be 3: 30 AM). Is this a problem?

§ Security: – Want to tightly control who may use these resources and how

§ Security: – Want to tightly control who may use these resources and how they may be used. § How about Larry and Sarah wanting to use your printer at the same time (which happens to be 3: 30 AM). Is this a problem? – Might want to have a scheduler, which in this case need not be more sophisticated than turning off the printer. § What if David forgot Dan’s IP address and cannot gain access to his laptop? How could this be resolved (assuming you want it resolved)?

§ What if David forgot Dan’s IP address and cannot gain access to his

§ What if David forgot Dan’s IP address and cannot gain access to his laptop? How could this be addressed (assuming you want it addressed)? – You could provide an information service that could tell David how to find the laptop. § You would also have to deal with allocating multiple resources to a user, e. g. , a laptop to write a paper and a printer to print it out. Thus need a resource manager. § Also need a way to monitor your application executing in your VO Grid.

Grid Computing Software Infrastructure

Grid Computing Software Infrastructure

Open Grid Services Architecture § Developed by the Global Grid Forum to define a

Open Grid Services Architecture § Developed by the Global Grid Forum to define a common, standard, and open architectures for Grid-based applications. – Provides a standard approach to all services on the Grid. § VO Management Service. § Resource discovery and management service: § Job management service. § Security services. § Data management services. § Built on top of and extends the Web Services architecture, protocols, and interfaces.

 A stateless Web Service invocation

A stateless Web Service invocation

Figure 1. 11. A stateful Web Service invocation

Figure 1. 11. A stateful Web Service invocation

§ Relationship between OGSA, WSRF, and Web Services

§ Relationship between OGSA, WSRF, and Web Services

WSRF § Web Services Resource Framework – – – a specification developed by OASIS.

WSRF § Web Services Resource Framework – – – a specification developed by OASIS. WSRF specifies how to make Web Services stateful. joint effort by the Grid and Web Services communities. WSRF provides the stateful services that OGSA needs. OGSA is the architecture, WSRF is the infrastructure on which that architecture is built on.

Standards Bodies The primary standards-setting body is 1: § Global Grid Forum (GGF) –

Standards Bodies The primary standards-setting body is 1: § Global Grid Forum (GGF) – – – Started in 1998 Meets three times a year, GGF 1, GGF 2, GGF 3 … More than 40 organizations involved and growing … Others: § W 3 C consortium (Worlds Wide Web Consortium) – Working on standardization of web-related technologies such as XML – See http: //www. w 3. org § OASIS (Organization for the Advancement of Structured Information Standards) § IETF, DMTF 1 “The Grid Core Technologies” by M. Li and M. Baker, 2005, page 4.

Standards in the Web Services World § § XML introduced (ratified) in 1998 SOAP

Standards in the Web Services World § § XML introduced (ratified) in 1998 SOAP ratified in 2000 Web services developed Subsequently, standards have been are continuing to be developed: – WSDL – WS-* where * refers to names of one of many standards

Standards in the grid computing world § Open Grid Services Architecture (OGSA) § First

Standards in the grid computing world § Open Grid Services Architecture (OGSA) § First announced at GGF 4 in Feb 2002 § OGSA does not give details of implementation.

Globus Project § Open source software toolkit developed for grid computing. § Roots in

Globus Project § Open source software toolkit developed for grid computing. § Roots in I-way experiment. § Work started in 1996. § Four versions developed to present time. § Reference implementations of grid computing standards. § Defacto standard for grid computing.

Globus Version 4 § A “toolkit” of services and packages for creating the basic

Globus Version 4 § A “toolkit” of services and packages for creating the basic grid computing infrastructure § Higher level tools added to this infrastructure § Version 4 is web-services based § Some non-web services code exists from earlier versions (legacy) or where not appropriate (for efficiency, etc. ).

Layered diagram of OGSA, GT 4, WSRF, and Web Services

Layered diagram of OGSA, GT 4, WSRF, and Web Services

§ Each part comprises a set of web services and/or non-web service components. §

§ Each part comprises a set of web services and/or non-web service components. § Some built upon earlier versions of Globus.

Globus Open Source Grid Software G T 4 G T 3 G T 2

Globus Open Source Grid Software G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Non-WS Components Replica Location Service XIO Credential Management Security Data Management Execution Management Information Services Common Runtime I Foster

Another view of GT 4 Components SERVER Your Python Client Your C Service py.

Another view of GT 4 Components SERVER Your Python Client Your C Service py. Globus WS Core C WS Core Java Services in Apache Axis Python hosting, Plus GT Libraries and Handlers GT Libraries Pre-WS MDS Your Python Service Pre-WS GRAM X. 509 credentials = common authentication RLS Your Java Service GRAM RFT Delegation Index Trigger Archiver CAS OGSA-DAI GTCP Interoperable WS-I-compliant SOAP messaging Your CC Client Your Java Client Your Python Client My. Proxy Your CC Client Simple. CA Your Java Client Grid. FTP CLIENT C Services using GT Libraries and Handlers I Foster

GT Core § Provides the ability to create services running inside the GT 4

GT Core § Provides the ability to create services running inside the GT 4 container.

Java WS Core G T 4 G T 3 G T 2 G T

Java WS Core G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime

GT 4 Web Services Core Custom Web Services Custom GT 4 WSRF Web Services

GT 4 Web Services Core Custom Web Services Custom GT 4 WSRF Web Services WS-Addressing, WSRF, WS-Notification Registry Administration GT 4 Container User Applications WSDL, SOAP, WS-Security I Foster

Execution Management Key component GRAM (Grid Resource Allocation Manager) § For submitting executable jobs

Execution Management Key component GRAM (Grid Resource Allocation Manager) § For submitting executable jobs § May interface to a local job scheduler

GRAM (Grid Resource Allocation Manager) G T 4 G T 3 G T 2

GRAM (Grid Resource Allocation Manager) G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime

GT 4 GRAM Structure: Sun Grid Engine Job tions func Delegate GT 4 Java

GT 4 GRAM Structure: Sun Grid Engine Job tions func Delegate GT 4 Java Container GRAM services Delegation Transfer request RFT File Transfer Compute element Local job control Deleg ate sudo Client Service host(s) and compute element(s) GRAM adapter Grid. FTP control Local scheduler User job FTP data Data management components Grid. FTP Remote storage element(s) I Foster

Security Components Addresses the security requirements of grid computing. Three important factors are: §

Security Components Addresses the security requirements of grid computing. Three important factors are: § Authorization – Process of deciding whether a particular identity can access a particular resource § Authentication – Process of deciding whether a particular identity is who he says he is (applies to humans and systems) § Delegation (somewhat specific to grid computing) – Process of giving authority to another identity (usually a computer/process) to act on your behalf.

Security continued § Security aspects complicated by the fact that virtual organization members and

Security continued § Security aspects complicated by the fact that virtual organization members and resources can be in different administrative domains.

Security G T 4 G T 3 G T 2 G T 3 G

Security G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime

GT 4 Data Management § § Move large data to/from nodes Replicate data for

GT 4 Data Management § § Move large data to/from nodes Replicate data for performance & reliability Locate data of interest Provide access to different data sources – File systems, parallel file systems, hierarchical storage (Grid. FTP) – Databases (OGSA DAI)

Grid. FTP and Reliable File Transfer G T 4 G T 3 G T

Grid. FTP and Reliable File Transfer G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime

Grid. FTP § Built on FTP using separation of data and control channels §

Grid. FTP § Built on FTP using separation of data and control channels § Provides features for – – – Large data transfers Secure transfers Fast transfers Reliable transfers Third party transfers § Not a web service – RTF (Reliable File Transfer) service provided WSlevel interface

Parallel transfers and striping § Using multiple (virtual) connections for transfer – Same external

Parallel transfers and striping § Using multiple (virtual) connections for transfer – Same external network – Speed improvement possible, but limited by network card § Striping – a version of parallel transfers that can use separate hardware interfaces – Implemented in GT 4.

Monitoring and Discovery G T 4 G T 3 G T 2 G T

Monitoring and Discovery G T 4 G T 3 G T 2 G T 3 G T 4 Community Scheduler Framework [contribution] Delegation Service Python WS Core [contribution] C WS Core Community Authorization Service OGSA-DAI [Tech Preview] WS Authentication Authorization Reliable File Transfer Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS 4) Java WS Core Grid. FTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS 2) C Common Libraries Pre-WS Authentication Authorization Web Services Components Replica Location Service XIO Credential Management Security Data Management Non-WS Execution Management Information Services Common Runtime

Monitoring and Discovery § WSRF provides common mechanisms for monitoring and discovering a service:

Monitoring and Discovery § WSRF provides common mechanisms for monitoring and discovering a service: § GT 4 “aggregator” services within MDS: – MDS-Index: collects state information from registered resources and makes it available as XML document – MDS-Trigger: passes this information to an executable – MDS-Archive: archives state information (awaiting implementation) § Every GT 4 is discoverable