Grid Technology A Web Services Globus OGSA Grid
Grid Technology A Web Services Globus OGSA & Grid Architecture CERN Geneva April 1 -3 2003 Geoffrey Fox Community Grids Lab Indiana University gcf@indiana. edu
With Thanks to • Tony Hey my co-speaker and • • • I adapted presentations from Marlon Pierce Dennis Gannon Globus Malcolm Atkinson David de Roure
Fermilab Experiments 1975 -1980 Regge Theory 1978 Hadron Jets in 1977 Compared to Feynman Field (Fox) Model E 350 0 X -t E 260 200 Ge. V hp 0 X 0
Caltech Hypercube JPL Mark II 1985 Chuck Seitz 1983 Hypercube as a cube
History New York Times 1984 • One of today's fastest computers is the Cray 1, which can do 20 million to 80 million operations a second. But at $5 million, they are expensive and few scientists have the resources to tie one up for days or weeks to solve a problem. • ``Poor old Cray and Cyber (another super computer) don't have much of a chance of getting any significant increase in speed, '' Fox said. ``Our ultimate machines are expected to be at least 1, 000 times faster than the current fastest computers. '' (80 gigaflops predicted. Earth Simulator is 40, 000 gflops) • But not everyone in the field is as impressed with Caltech's Cosmic Cube as its inventors are. The machine is nothing more nor less than 64 standard, off-the-shelf microprocessors wired together, not much different than the innards of 64 IBM personal computers working as a unit. • The Caltech Hypercube was “just a cluster of PC’s”!
History New York Times 1984 • ``We are using the same technology used in PCs (personal computers) and Pacmans, '' Seitz said. The technology is an 8086 microprocessor capable of doing 1/20 th of a million operations a second with 1/8 th of a megabyte of primary storage. Sixty-four of them together will do 3 million operations a second with 8 megabytes of storage. • Computer scientists have known how to make such a computer for years but have thought it too pedestrian to bother with. • ``It could have been done many years ago, '' said Jack B. Dennis, a computer scientist at the Massachusetts Institute of Technology who is working on a more radical and ambitious approach to parallel processing than Seitz and Fox. • ``There's nothing particularly difficult about putting together 64 of these processors, '' he said. ``But many people don't see that sort of machine as on the path to a profitable result. '‘ • So clusters are a trivial architecture (1984) …… • So architecture is unchanged ; unfortunately after 20 years research, programming model is also the same (message passing)
What is a Grid I? • Collaborative Environment (Ch 2. 2, 18) • Combining powerful resources, federated computing and a security structure (Ch 38. 2) • Coordinated resource sharing and problem solving in dynamic multiinstitutional virtual organizations (Ch 6) • Data Grids as Managed Distributed Systems for Global Virtual Organizations (Ch 39) • Distributed Computing or distributed systems (Ch 2. 2, 10) • Enabling Scalable Virtual Organizations (Ch 6) • Enabling use of enterprise-wide systems, and someday nationwide systems, that consist of workstations, vector supercomputers, and parallel supercomputers connected by local and wide area networks. Users will be presented the illusion of a single, very powerful computer, rather than a collection of disparate machines. The system will schedule application components on processors, manage data transfer, and provide communication and synchronization in such a manner as to dramatically improve application performance. Further, boundaries between computers will be invisible, as will the location of data and the failure of processors. (Ch 10)
What is a Grid II? • Supporting e-Science representing increasing global collaborations of people and of shared resources that will be needed to solve the new problems of Science and Engineering (Ch 36) • As infrastructure that will provide us with the ability to dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and distributed applications. (Ch 1) • Makes high-performance computers superfluous (Ch 6) • Metasystems or metacomputing systems (Ch 10, 37) • Middleware as the services needed to support a common set of applications in a distributed network environment (Ch 6) • Next Generation Internet (Ch 6) • Peer-to-peer Network (Ch 10, 18) • Realizing thirty year dream of science fiction writers that have spun yarns featuring worldwide networks of interconnected computers that behave as a single entity. (Ch 10)
What is Grid Technology? • Grids support distributed collaboratories or virtual organizations integrating concepts from • The Web • Distributed Objects (CORBA Java/Jini COM) • Globus Legion Condor Net. Solve Ninf and other High Performance Computing activities • Peer-to-peer Networks • With perhaps the Web being the most important for “Information Grids” and Globus for “Compute Grids” • Use Information Grids and not usual Data Grids as “distributed file systems” (holding lots of data!) are handled in Compute Grids
PPPH: Paradigms Protocols Platforms and Hosting I • We will start from the Web view and assert that basic paradigm is • Meta-data rich Web Services communicating via messages • These have some basic support from some runtime such as. NET, Jini (pure Java), Apache Tomcat+Axis (Web Service toolkit), Enterprise Java. Beans, Web. Sphere (IBM) or GT 3 (Globus Toolkit 3) – These are the distributed equivalent of operating system functions as in UNIX Shell • Called Hosting Environment or platform
Some Basic Observations • Grids manage and share asynchronous resources in a rather centralized fashion • Peer-to-peer networks are “just like” Grids with different implementations of services like registration and look-up • Web Services interact with messages – Everything (including applications like Power. Point will be a WS? ) – see later short discussion • Computers are fast and getting faster. One can afford many strategies that used to be unrealistic – All messages can be publish/subscribe – Software message routing • XML will be used for most interesting data and meta-data – One will store/consider data and meta-data separately but often use same technology to manage both of them. • Need Synchronous and Asynchronous Resource Sharing – Integrate Grid and Collaboration technology
Classic Grid Architecture Resources Database Composition Content Access Netsolve Security Collaboration Middle Tier Brokers Service Providers Computing Middle Tier becomes Web Services Clients Users and Devices
What is a Web Service I • A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL) • In principle, computer program can be in any language (Fortran. . Java. . Perl. . Python) and the interfaces can be implemented in any way what so ever – Interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) but • The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python • Web Services separate the meaning of a port (message) interface from its implementation • Enhances/Enables Re-usable component model of ANY electronic resource
Raw Resources Raw Data (Virtual) XML Data Interface Web Service (WS) WS WS WS etc. XML WS to WS Interfaces WS WS WS (Virtual) XML Knowledge (User) Interface Render to XML Display Format Clients (Virtual) XML Rendering Interface
What is a Web Service II • Web Services have important implication that ALL interfaces are XML messages based. In contrast • Most Windows programs have interfaces defined as interrupts due to user inputs • Most software have interfaces defined as methods which might be implemented as a message but this is often NOT explicit WSDL interfaces Security WSDL interfaces Payment Credit Card Catalog Warehouse shipping
What is a Web Service III • “Everything electronic” is a resource – Computers; Programs; People – Data (from sensors to this presentation to email to databases) • “Everything electronic” is a distributed object • All resources have interfaces which are defined in XML for both properties (data-structure) and methods (service, function, subroutine) (Resources are Services) – We can assume that a data-structure property has getproperty() and setproperty(value) methods to act as interface • All resources are linked by messages with structure, which must be specifiable in XML • All resources have a URI such as unique: //a/b/c …….
WSDL Abstractions • WSDL abstracts a program as an entity that does something given one or more inputs with its results defined by streams on one or more outputs. • Functions are defined by method name and parameters methodname(parm 1, parm 2, … parm. N) – Where parameters are “Input” “Output” or both • In WSDL, we will have a Web Service which like a (Java or CORBA Program) can be thought of as a (distributed) object with many methods – Instead of a function call, the “calling routine” sends an XML message to the Web Service specifying methodname and values of the parameters – Note name of function is just another parameter
Details of WSDL Protocol Stack • UDDI finds where programs are – remote( (distributed) programs are just Web Services – (not a great success) • WSFL links programs together (under revision as BPEL 4 WS) • WSDL defines interface (methods, parameters, data formats) • SOAP defines structure of message including serialization of information • HTTP is negotiation/transport protocol • TCP/IP is layers 3 -4 of OSI • Physical Network is layer 1 of OSI UDDI or WSIL WSFL WSDL SOAP or RMI HTTP or SMTP or IIOP or RMTP TCP/IP Physical Network
Education as a Web Service • Can link to Science as a Web Service and substitute educational modules • “Learning Object” XML standards already exist from IMS/ADL http: //www. adlnet. org – need to update architecture • Web Services for virtual university include: • Registration • Performance (grading) • Authoring of Curriculum • Online laboratories for real and virtual instruments • Homework submission • Quizzes of various types (multiple choice, random parameters) • Assessment data access and analysis • Synchronous Delivery of Curricula • Scheduling of courses and mentoring sessions • Asynchronous access, data-mining and knowledge discovery • Learning Plan agents to guide students and teachers
What are System and Application Services? • There are generic Grid system services: security, collaboration, persistent storage, universal access – OGSA (Open Grid Service Architecture) is implementing these as extended Web Services • An Application Web Service is a capability used either by another service or by a user – It has input and output ports – data is from sensors or other services • Consider Satellite-based Sensor Operations as a Web Service – Satellite management (with a web front end) – Each tracking station is a service – Image Processing is a pipeline of filters – which can be grouped into different services – Data storage is an important system service – Big services built hierarchically from “basic” services • Portals are the user (web browser) interfaces to Web services
Application Web Services Prog 1 Prog 2 Filter 3 • Filter 1 Note Service model integrates sensors, sensor analysis, simulations and people WS WS WS • An Application Web Service is a capability used either by another service or by a user Build as multiple Filter Web Services interdisciplinary – It has input and output ports – data is from users, sensors or other services – Big services built hierarchically from “basic” services. Programs Sensor Data as a Web service (WS) Simulation WS Data Analysis WS Sensor Management WS Visualization WS
The Application Service Model • As bandwidth of communication (between) services increases one can support smaller services • A service “is a component” and is a replacement for a library in case where performance allows • Services (components) are a sustainable model of software development – each service has documented capability with standards compliant interfaces – XML defines interfaces at several levels – WSDL at Service interface level and XSIL or equivalent for scientific data format • A service can be written as Perl, Python, Java Servlet, Enterprise Java. Bean, CORBA (C++ or Fortran) Object … • Communication protocol can be RMI (Java), IIOP (CORBA) or SOAP (HTTP, XML) ……
Application with W 3 C DOM Structure as a Web Service Resource Facing Ports Data Application as a Web service Application Model Remaining W 3 C DOM Semantic Events MVC M: Model Control User Facing View Ports Events as Rendering as Messages C: Control V: View Application View and Selected Control Selected W 3 C DOM Semantic Events W 3 C DOM Raw(UI) Events W 3 C DOM User Interface
7 Primitives in WSDL • types: which provides data type definitions used to describe the messages exchanged. • message: which represents an abstract definition of the data being transmitted. A message consists of logical parts, each of which is associated with a definition within some type system. • operation– an abstract description of an action supported by the service. • port. Type: which is a set of abstract operations. Each operation refers to an input message and output messages. • binding: which specifies concrete protocol and data format specifications for the operations and messages defined by a particular port. Type. • port: which specifies an address for a binding, thus defining a single communication endpoint. • service: which is used to aggregate a set of related ports
<? xml version="1. 0" encoding="UTF-8"? > <wsdl: definitions> <wsdl: message name="exec. Local. Command. Response"> <wsdl: message name="exec. Local. Command. Request"> <wsdl: port. Type name="SJws. Imp"> <wsdl: operation name="exec. Local. Command" parameter. Order="in 0"> <wsdl: input message="impl: exec. Local. Command. Request" name="exec. Local. Command. Request"/> <wsdl: output message="impl: exec. Local. Command. Response" name="exec. Local. Command. Response"/> </wsdl: operation> </wsdl: port. Type> <wsdl: binding name="Submitjob. Soap. Binding" type="impl: SJws. Imp"> <wsdlsoap: binding style="rpc" transport="http: //schemas. xmlsoap. org/soap/http"/> <wsdl: operation name="exec. Local. Command"> <wsdlsoap: operation soap. Action=""/> <wsdl: input name="exec. Local. Command. Request"> <wsdl: output name="exec. Local. Command. Response"> </wsdl: operation> </wsdl: binding> <wsdl: service name="SJws. Imp. Service"> <wsdl: port binding="impl: Submitjob. Soap. Binding" name="Submitjob"> </wsdl: service> </wsdl: definitions>
Discussion of 7 WSDL Primitives • types specify data-structures which are equivalent to arguments of methods • message specifies collections of types and is equivalent to set of arguments in a method call. Note that it is an “abstract method” in Java terminology • operation is a a collection of input output and fault messages; there are 4 types of operation one-way(service just receives a message), request -response(RPC), solicit-response, notification (services pushes out a message) • port. Type represents a single channel that can support multiple operations. It is “abstract” as specified as a set of operations. It is equivalent to a “interface or abstract class” in Java • binding tells you transport and message format for a porttype (which can have multiple bindings to reflect say performance-portability trades) • port combines a binding and an endpoint network address (URL) and is like a “class instance” • service consists of multiple ports and is equivalent to a “program” in Java
OGSA OGSI & Hosting Environments • Start with Web Services in a hosting environment • Add OGSI to get a Grid service and a component model • Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities
Functional Level above OGSA • • • Systems Management and Automation Workload / Performance Management Security Availability / Service Management Logical Resource Management Clustering Services Connectivity Management Physical Resource Management Perhaps Data Access belongs here
Two-level Programming I • The paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies – C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access • Such nuggets accept and produce data from users files and databases Nugget Data • The Grid is built by coordinating such nuggets assuming we have solved problem of programming the nugget
Two-level Programming II • The Grid is discussing the linkage and distribution of the nuggets with the only Nugget 1 Nugget 2 addition runtime interfaces to Grid as opposed to Nugget 3 Nugget 4 UNIX data streams • Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs • Such interpretative environments are the single processor analog of Grid Programming • Some projects like Gr. ADS from Rice University are looking at integration between nugget levels but dominant effort looks at each level separately
Why we can dream of using HTTP and that slow stuff • • • We have at least three tiers in computing environment Client (user portal discussed Thursday) “Middle Tier” (Web Servers/brokers) Back end (databases, files, computers etc. ) In Grid programming, we use HTTP (and used to use CORBA and Java RMI) in middle tier ONLY to manipulate a proxy for real job – Proxy holds metadata – Control communication in middle tier only uses metadata – “Real” (data transfer) high performance communication in back end
User Services System Services Grid Computing Environments Portal Services System Services Application Metadata Service Middleware System Services Actual Application System Services Raw (HPC) Resources “Core” Grid Database
OGSA OGSI & Hosting Environments • Start with Web Services in a hosting environment • Add OGSI to get a Grid service and a component model • Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities
PPPH: Paradigms Protocols Platforms and Hosting II • Self-describing programs/interfaces are key to scaling – Minimize amount of work system has to do – Hide as much as possible in services and applications • Protocols describe (in “principle” at least) those rules that system obeys and uses to deliver information between services (processes) • Interfaces tell the service what to do to interpret the results of communication • HTTP is the dominant transport protocol of the Web • HTML is the “interface” telling browser how to render • But you can extend interface to allow PDF, multimedia, Power. Point using “helper applications” which are (with more or less convenience) which are “automatically” downloaded if not already available – “Mime types” essentially self-describe” each interface
Analogy with Web II • HTTP and HTML are the analogies on the client side • A “Web Service” generalizes a CGI Script on server side – CGI is essentially a Distributed Object technology allowing server to access an arbitrary program labeled by a URL plus an ugly syntax to specify name and parameters of program to run • Roughly WSDL (Web Service Description Language) is a better to specify program name and its parameters • Web uses other protocols – HTTPS for secure links and RTP etc. for multimedia (UDP) streams – These again are required to integrate system – codecs like MPEG are interfaces interpreted by client – There are further protocols like H 323 and SIP which will be placed (IMHO) by HTTP plus RTP etc. We should minimize number of protocols to get maintainable systems
PPPH: Paradigms Protocols Platforms and Hosting III • There are set of system capabilities which cannot be captured as standalone services and permeate Grid • Meta-data rich Message-linked Web Services is permeating paradigm • Component Model such as “Enterprise Java. Bean (EJB)” or OGSI describes the formal structure of services – EJB if used lives inside OGSI in our Grids • Invocation Framework describes how you interact with system • Security in fine grain fashion to provide selective authorization (Globus and EDG WP 6) • Policy context describes rules for this particular Grid • Transport mechanisms abstract concepts like ports and Quality of Service • Messaging abstracts destination and customization of content • Network (monitoring, performance) EDG WP 7 • Fabric (resources) EDG WP 4
Architecture in Pictures I Services Abstract Model OGSI Messaging Services Hosting Environment determines physical model Messaging Invocation Framework Network Resources
Architecture in Pictures II OGSA Interoperable Grid OGSA Interfaces OGSI Grid Services Messaging Network Monitoring and Scheduling Network Resources
Architecture in Pictures III OGSA Federated Grid Mediation Service converting between OGSA and “native” services Mediation Service Native Services Messaging Network Monitoring and Scheduling Network Resources
Virtualization • The Grid could and sometimes does virtualize various concepts • Location: URI (Universal Resource Identifier) virtualizes URL • Replica management (caching) virtualizes file location generalized by Gri. Phyn virtual data concept • Protocol: message transport and WSDL bindings virtualize transport protocol as a Qo. S request • P 2 P or Publish-subscribe messaging virtualizes matching of source and destination services • Semantic Grid virtualizes Knowledge as a meta-data query • Brokering virtualizes resource allocation • Virtualization implies references can be indirect
IFS: Interfaces and Functionality and Semantics I • The Grid platform tries to minimize detail in protocols and maximize detail in interfaces to enhance scaling • However rich meta-data and semantics are critical for correct and interesting operation – Put as much semantic interpretation as you can into specific services – Lack of Semantic interoperation is in fact main weakness of today’s Grids and Web services • Everything becomes a service (See example of education) whether system or application level • There are some very important “Global Services” – Discovery (look up) and Registration of service metadata – Workflow – Meta. Schedulers
IFS: Interfaces and Functionality and Semantics II • There are many other generally important services • OGSA-DAI The Database Service • Portal Service linked to by WSRP (Web services for Remote Portals) • Notification of events • Job submission • Provenance – interpret meta-data about history of data • File Interfaces • Sensor service – satellites … • Visualization • Basic brokering/scheduling
Globus in a Nutshell from IPG • GT 2 (or Globus Toolkit 2) is original (non web service based) version which is basis of EDG (European Data Grid) work • C programs and libraries • See Chapter 5 of book with background in chapters 2 -4 and 37 • http: //www. ipg. nasa. gov/ipgusers/globus/ • http: //www. globusworld. org/globusworld_web/jw 2 _program_tut. htm
Globus GT 2 from IPG • The goal of the Globus GT 2 is to provide dependable, consistent, pervasive access to high-end resources. – This is original Grid “start” general recently to virtual organizations and data grids • The Globus Project offers the most widely used computing grid middleware. The Globus Project is a joint effort of Argonne National Laboratory, the Informational Sciences Institute of the University of Southern California, in collaboration with numerous other organizations including NCSA, NPACI, UCSD, and NASA. See http: //www. globus. org/ for history, goals, release and usage notes, software distributions, and research papers.
Globus GT 2 II • Grid Fabric: Layer One The fabric of the Grid comprises the underlying systems, computers, operating systems, networks, storage systems, and routers—the building blocks. • Grid Services: Layer Two Grid services integrate the components of the Grid fabric. Examples of the services that are provided by Globus Toolkit 2: • GRAM The Globus Resource Allocation Manager, GRAM, is a basic library service that provides capabilities to do remote-submission job start up. GRAM unites Grid machines, providing a common user interface so that you can submit a job to multiple machines on the Grid fabric. GRAM is a general, ubiquitous service, with specific application toolkit commands built on top of it • MDS The Monitoring and Discovery Service, also known as GIS, the Grid Information Service, provides information service. You query MDS to discover the properties of the machines, computers and networks that you want to use: how many processors are available at this moment? What bandwidth is provided? Is the storage on tape or disk? Is the visualization device an immersive desk or CAVE? Using an LDAP (Lightweight Directory Access Protocol) server, MDS provides middleware information in a common interface to put a unifying picture on top of disparate equipment. • Contd …
Globus GT 2 III • GSI gss-api library for adding authentication to a program. GSI provides programs, such as grid-proxy-init, to facilitate login to a variety of sites, while each site has its own flavor of security measures. That is, on the fabric layer, the various machines you want to use might be governed by disparate security policies; GSI provides a means of simplifying multiple remote logins. The standard installation is based on a PKI security system; the Kerberos installation of Globus is less standard. (Some installations with Do. E and Do. D insist on Kerberos) • Grid. FTP A new (in Globus 2. 0) protocol for file transfer over a grid. This is a Global Grid Forum standard • GASS Globus Access to Secondary Storage, provides command-line tools and C APIs for remotely accessing data. GASS integrates Grid. FTP, HTTP, and local file I/O to enable secure transfers using any combination of these protocols. .
Globus GT 2 IV • Application Toolkits: Layer Three • • Application toolkits use Grid Services to provide higher-level capabilities, often targeted to specific classes of application. For example, the Globus development team has created a set of Grid service tools and a toolkit of programs for running remotely distributed jobs. These include remote job submission commands ( globusrun, globus-job-submit, globus-job-run), built on top of the GRAM service, and MPICH-G 2, a Grid-enabled implementation of the Message Passing Interface (MPI). A more modern interface is through Co. G Kits (Commodity Grid) to different languages – Perl Python Java – see chapter 26 of Book The Java Co. G kit provides a natural way to link GT 2 to a Web service framework Globus Toolkit 3 (GT 3) effectively integrated Co. G Kit interface with core Globus by wrapping all Globus Services as Web services
Job Submission in Globus • Very similar to UNIX Shell – build Portal Web Interfaces to specific or general Shell commands. Some example commands • globusrun Runs a single executable on a remote site with an RSL specification. • globus-job-cancel Cancels a job previously started using globus-jobsubmit. • globus-job-run Allows you to run a job at one or several remote resources. It translates the program arguments to an RSL request and uses globusrun to submit the job. • globus-job-clean Kills the job if it is still running and cleans the information concerning the job. • globus-job-status Display the status of the job. See also globus-getoutput to check the standard output or standard error of your job. • These are all controlled by metadata specified by the Globus Resource Specification Language (RSL) which provides a common language to describe jobs and the resources required to run them. • http: //www. globus. org/gram_rsl_parameters. html • The simplest RSL expression looks something like the following. (executable=/bin/ls)
Virtual Data Toolkit VDT from Gri. Phyn • http: //www. lsc-group. phys. uwm. edu/vdt/ • Trillium (PPDG from Do. E Gri. Phyn and i. VDg. L from NSF) is major US effort building Grid application software with a strong particle physics emphasis • VDT is their major software release and its heart is Condor and GT 2. – There is some “virtual data” software as well but not clear if this is of interest in production use (interesting research area) • Condor (Chapter 11 of Book) is powerful job scheduler for clusters and “cycle scavenging” – It has a well developed interface (Class. Ads) for defining requirements of jobs and matching to compute capabilities
OGSA/OGSI Top Level View Chapters 7 to 9 of Book http: //www. gridforum. org/Meetings/ggf 7/docs/default. htm http: //www. globusworld. org/globusworld_web/jw 2_program_tut. htm Domain-specific services – Stuff you can’t live without Broadly applicable services: registry, – If you built a Grid you authorization, monitoring, data access, etc. would need to invent OGSI these things Host. Env. & Protocol Bindings Hosting. Environment Hosting Transport Protocol Models for resources & other entities More specialized services: data replication, workflow, etc. Other models • OGSA is the set of “core” Grid services
• • OGSI Open Grid Service Interface http: //www. gridforum. org/ogsi-wg It is a “component model” for web services. It defines a set of behavior patterns that each OGSI service must exhibit. Every “Grid Service” port. Type extends a common base type. – Defines an introspection model for the service – You can query it (in a standard way) to discover • What methods/messages a port understands • What other port types does the service provide? • If the service is “stateful” what is the current state? • A set of standard port. Types for – Message subscription and notification – Service collections • Each service is identified by a URI called the “Grid Service Handle” • GSHs are bound dynamically to Grid Services References (typically wsdl docs) – A GSR may be transient. GSHs are fixed. – Handle map services translate GSHs into GSRs.
OGSI and Stateful Services • Sometimes you can send a message to a service, get a result and that’s the end – This is a statefree service • However most non-trivial services need state to allow persistent asynchronous interactions • OGSI is designed to support Stateful services through two mechanisms – Information Port: where you can query for SDE (Service Definition Elements) – “Factories” that allow one to view a Service as a “class” (in an object-oriented language sense) and create separate instances for each Service invocation • There are several interesting issues here – Difference between Stateful interactions and Stateful services – System or Service managed instances
Factories and OGSI • Stateful interactions are typified by amazon. com where messages carry correlation information allowing multiple messages to be linked together – Amazon preserves state in this fashion which is in fact preserved in its database permanently • Stateful services have state that can be queried outside a particular interaction • Also note difference between implicit and explicit factories – Some claim that implicit factories scale as each service manages its own instances and so do not need to worry about registering instances and lifetime management • See WS-Addressing from largely IBM and Microsoft http: //msdn. microsoft. com/webservices/default. aspx? pull=/library/en-us/dnglobspec/html/ws-addressing. asp Explicit Factory Implicit Factory F A C T O R Y 1 2 3 4 F A C T O R Y
Open Grid Service Architecture • OGSA-WG chaired by – Ian Foster, ANL and Univ. of Chicago – Jeff Nick, IBM – Dennis Gannon, IU • Active Members from – IBM, Fujitsu, NEC, SUN, Hitachi, Avaki – Univ. of Mich, Chicago, Indiana (not much academic involvement)
OGSA Core Services I • Registries, and namespace bindings – Registry is a collection of services indexed by service metadata. • “find me a service with property X. ” – Directory is a map from a namespace to GSHs. – A namespace is a human understandable version of a Grid Handle • Queues – For building schedulers and resource brokers – Jobs and other requests are in queues – This is high-level messaging
Security • Base this on Web Services Security • Authentication – 2 -way. Who are you and who am I? • Authorization – What am I authorized to use/see/modify • Accounting/Billing – (not really security – see monitoring) • Privacy • Group Access – Easily create a group to share access to a virtual Grid. • Very complex issues related to services and message delivery.
Common Resource Model • Every resource on the grid that is manageable is represented by a service instance – CRM is the Schema hierarchy that defines each resource (with its meta-data) – Service for a resource presents its management interface to authorized parties.
Policy Management • Policy management services – Mechanism to publish policy and the services it applies to. – Policy life-cycle mgmt. • Policy languages exist for routing, security, resource use
Grid Service Orchestration • Creating new services by composing other services • Two types of Orchestration – Composition in space • One services is directly invoking another – Composition in time • Managing the workflow – First do this. – Then do this and that – When that is done do this » If something goes wrong do this – And so on…
Data Services • • • Distributed Data Access Data Caching Data Replication Services Metadata Catalog Services Storage Services
Metering Resource Consumption • At what granularity do services report resource consumption? • How do they report it? • How are services metered?
Transactions • Two threads/workflows must synchronize and agree they have done so before moving on. – Usually involves modification to two or more persistent states – WS-transactions has been “proposed”.
Messaging, Events, Logging • Messaging – Delivery Model – Queuing and Pub/Sub message delivery (not clear to me why these are different as publish/subscribe implemented as topic labeled queues) • Events – Time stamped messages – Standard XML schemas • Standard Logging • MQSeries (IBM), JMS (Java Message Service) and Narada. Brokering (Indiana) provide this but most naturally at level of “platform/hosting environment”
Where should Messaging be? • One can define messaging at the OGSA level “above the hosting environment” but that makes it difficult to virtualize messaging and support network performance – Publish-subscribe or better queued messaging naturally supports optimized routing based on network performance • One can naturally support collaborative Web services in same fashion in a way that it MUCH easier that Groove. Networks and other collaborative environments (Webe. X, Placeware(Microsoft)) do as long as every application is a Web service • OGSA location of messages is fine for low volume logging or notification events – Not good for events on “video” application where each frame is an update event
Application as a Web service From Collaboration As a WS Events Rendering From Master Participating Client From Collaboration As a WS W 3 C DOM Events User Interface Application as a Web service To Collaborative Events Rendering Clients Master Client W 3 C DOM Events User Interface
Collaboration: Shared Display n Sharing can be done at any point on “object” or Web Service pipeline Shared Web Service Shared Export Shared Event Object’’ Shared Display Object Viewer Object Display Master Shared Display shares framebuffer with events corresponding to changed pixels in master client. Event (Message) Service As long as pipeline uses messages, easy to make collaborative Windows framebuffers and in fact most applications do NOT expose a message based update interface Object Display
Shared Input Port (Replicated WS) Collaboration as a WS Set up Session with XGSP R U Web F Servic I I e O O F WS Viewer WS Display Master U Web F Servic I I e O O F Event (Message) Service R R U Web F Servic I I e O O F WS Viewer WS Display Other Participants WS Viewer WS Display
Shared Output Port Collaboration as a WS Set up Session with XGSP Web Service Message Interceptor F I WSDL R O Master U Application or Content source Web Service Text Chat Whiteboard Multiple masters O F I Event (Message) Service WS Viewer WS Display Other Participants WS Viewer WS Display
Narada. Brokering n Based on a network of cooperating broker nodes • Cluster based architecture allows system to scale to arbitrary size n n Originally designed to provide uniform software multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems. Now has four major core functions • Message transport (based on performance measurement) in heterogeneous multi-link fashion • General publish-subscribe including JMS & JXTA and support for RTP-based audio/video conferencing • Filtering for heterogeneous clients • Federation of multiple instances of Grid services
Role of Event/Message Brokers n n n We will use events and messages interchangeably • An event is a time stamped message Our systems are built from clients, servers and “event brokers” • These are logical functions – a given computer can have one or more of these functions • In P 2 P networks, computers typically multifunction; in Grids one tends to have separate function computers • Event Brokers “just” provide message/event services; servers provide traditional distributed object services as Web services There are functionalities that only depend on event itself and perhaps the data format; they do not depend on details of application and can be shared among several applications • Narada. Brokering is designed to provide these functionalities • MPI provided such functionalities for all parallel computing
Engineering Issues Addressed by Event / Messaging Service n n n Application level Quality of Service – e. g. give audio highest priority Tunnel through firewalls & proxies Filter messages to slow (collaborative/real-time) clients Choose Hardware or Software multicast Scaling of software multicast • Efficient calculation of destinations and routes. n n Integrate synchronous and asynchronous collaboration with same messaging, control, archiving for all functions Transparently replace single server JMS systems with a distributed solution. Provides reliable inter-peer group messaging for JXTA Open Source (high quality) messaging
Narada. Brokering implements an Event Service Destination Source Matching Routing Web Service 1 WSDL Ports n n n (Virtual) Queue Broker Filter workflow Web Service 2 WSDL Ports Filter is mapping to PDA or slow communication channel (universal access) – see our PDA adaptor Workflow implements message process Routing illustrated by JXTA and includes firewall Destination-Source matching illustrated by JMS using Publish. Subscribe mechanism These use Security model (being implemented) based on WS-Sec
Narada Broker Network (P 2 P) Community For message/events service Broker (P 2 P) Community Resource Hypercube topology Broker for brokers? Broker Tree for distance education with teacher. Broker at root (P 2 P) Community Data base Software multicast Broker (P 2 P) Community
Narada. Brokering Communication n n Applications interface to Narada. Brokering through User. Channels which NB constructs as a set of links between NB Broker waystations which may need to be dynamically instantiated User. Channels have publish/subscribe semantics with XML topics Links implement a single conventional “data” protocol. • Interface to add new transport protocols within the Framework • Administrative channel negotiates the best available communication protocol for each link Different links can have different underlying transport implementations • Implementations in the current release include support for TCP, UDP, Multicast, SSL and RTP. HTTP, HTTPS support will be available in Feb 2003 release. • Supports communication through proxies such as i. Planet, Netscape and Apache. • Supports communication through firewalls such as Microsoft ISA, Checkpoint.
Performance/Routing in Message-based Architecture A Satellite UDP Firewall HTTP Fast Link B 1 Hand-Held Protocol Software Multicast Dial-up Filter n n B 2 B 3 In traveling from cities A to B (say 3 separate passengers), one chooses between and changes transport mechanism at waystations to optimize cost, time, comfort, scenic beauty … Waystations are now NB brokers where one chooses transport protocol (individual or collective) • Able to choose between car, type of car, plane, train etc • Able to dynamically create waystations to cope with problems and acts as hubs for multicast messages • Knows about traffic jams and can assign the “HOV lane”
Note on Optimization n Note in parallel computing, couldn’t do much dynamic optimization as aiming at microsecond latency • Natural to use hardware routing n In Grid, time scales are different • 100 millisecond quite normal network latency • 30 millisecond typical packet time sensitivity (this is one audio or video frame) but even here can buffer 10 -100 frames on client (conferencing to streaming) • 1 millisecond is time for a Java server to “think” n n Jitter in latency (transit time) due to routing, processing (in NB) or packet loss recovery is important property Grid needs and can tolerate significant dynamic optimization
Sender/receiver/broker - (Pentium-3, 1 GHz, 256 MB RAM). 100 Mbps LAN. JDK-1. 3, Red Hat Linux 7. 3
Narada Performance Web Service Performance measurements are used by Links in • Reconfiguring Connectivity between nodes • Deciding underlying transport protocol • Determining possible filtering Each node determines performance of links of which it is endpoint Individual node web services are aggregated as another Web Service n n n Probably should replace by a more sophisticated measurement package Factors measured include Transit delays, bandwidth, Jitter, Receiving rates. Performance measurements are • Spaced out at increasing intervals for healthy channels. • Factors selectively measured for unhealthy channels. • No repeated measurements of bandwidth for example. • Injected into Narada network as XML events Administrative Interface
The Overall Architecture • The Grid is defined by a collection of distributed Services – For many users the primary interaction with the Grid will be through a portal Event and logging Services Portal Server My. Proxy Server Metadata Directory Service(s) Application Factory Services Messaging and group collaboration Directory & index Services
Application Portal in a Minute (box) n Systems like Unicore, GPDK, Gridport (Hot. Page), Gateway, Legion provide “Grid or GCE Shell” interfaces to users (user portals) • Run a job; find its status; manipulate files • Basic UNIX Shell-like capabilities n Application Portals (Problem Solving Environments) are often built on top of “Shell Portals” but this can be quite time confusing • Application Portal = Shell Portal Web Service + Application (factory) Web service
Application Web service n n n Application Web Service is ONLY metadata • Application is NOT touched Application Web service defined by two sets of schema: • First set defines the abstract state of the application n What are my options for invoking myapp? n Dub these to be “abstract descriptors” • Second set defines a specific instance of the application n I want to use myapp with input 1. dat on solar. uits. indiana. edu. n Dub these to be “instance descriptors”. Each descriptor group consists of • Application descriptor schema • Host (resource) descriptor schema • Execution environment (queue or shell) descriptor schema
Web Services as a Portlet • Each Web Service naturally has a user interface specified as “just another port” – Customizable for universal access • This gives each Web Service a Portlet view specified (in XML as always) by WSRP (Web services for Remote Portals) • So component model for resources “automatically” gives a component model for user interfaces – When you build your application, you define portlet at same time Application as a WS General Application Ports Interface with other Web Services WSDL W Application or Content source Web Service P S R User Face of Web Service WSRP Ports define WS as a Portlet Web Services have other ports (Grid Service) to be OGSI compliant
Online Knowledge Center built from Portlets A set of UI Components • Web Services provide a component model for the middleware (see large “common component architecture” effort in Dept. of Energy) • Should match each WSDL component with a corresponding user interface component • Thus one “must use” a component model for the portal with again an XML specification (portal. ML) of portal component
HTML Jetspeed Architecture Turbine Servlet JSP template ECS Root to HTML Screen Manager PSML ECS Portlet. Controller ECS ECS Portlet. Control ECS Portlets Data Portlet XML RSS, OCS, or other Local or remote ECS Portlet HTML JSP or VM Web. Page Portlets Local files Local templates Remote HTML User implemented using Portal API
Portlets and Portal Stacks Aggregation Portals (Jetspeed) User facing Web Service Ports Application Grid Web Services Core Grid Services Message Security, Information Services • User interfaces to Portal services (Code Submission, Job Monitoring, File Management for Host X) are all managed as portlets. • Users, administrators can customize their portal interfaces to just precisely the services they want.
Jetspeed Computing Portal: Choose Portlets 4 available portlets linking to Web Services I choose two
Choose Portlet Layout Choose 1 -column Layout Original 2 -column Layout
File management Tabs indicate available portlet interfaces. Lists user files on selected host, noahsark. File operations include Upload, download, Copy, rename, crossload
Sample page with several portlets: proxy credential manager, submission, monitoring
Administer Grid Portal Provide information about application and host parameters Select application to edit
- Slides: 96