Dataflows in SRB using SDSC Matrix Arun Jagatheesan
Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team Lead, SDSC Matrix San Diego Supercomputer Center 10 th Annual NPACI/SDSC Summer Computing Institute August 23 -27, 2004, Sun Diego, California, USA San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Talk Outline • • • Introduction to Gridflows Introduction to SDSC Matrix Project Data Grid Language Architecture of SDSC Matrix Usage • What can you do for Matrix? 2 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Acknowledgement • • Jonathan Weinberg Daniel Moore Allen Ding Reena Mathew Erik Vandekieft SRB Team You! – ( hey your name can be here ) SDSC SRB, NSF Gri. Phy. N, NSF SCEC, Do. E Portals Project, 3 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflows (Grid Workflow) • Automation of an execution pipeline • Data and/or tasks processed by multiple autonomous grid resources • According to set of procedural rules • Confluence of multiple autonomous administrative domains • Grid. Flow Execution Servers • By themselves are from autonomous administrative domains • P 2 P (Distributed) Control 4 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Talk Outline • • • Introduction to Gridflows Introduction to SDSC Matrix Project Data Grid Language Architecture of SDSC Matrix Usage • What can you do for Matrix? 5 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
SDSC Matrix Project • CS Research & Development • Gridflow Description, Data Grid Administration Rules • Gridflow P 2 P protocols for Gridflow Server Communication • Development • SRB Data Grid Web Services • SRB Datagrid flow automation and provenance • Theory Practice • Help in customized development & deployment of gridflow concepts in scientific / grid applications • Visibility and assist in standardization of efforts at GGF 6 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Advantages from SRB Perspective • Reduces the Client-Server Communication • The whole execution logic is sent to the server • Less number of WAN messages • Our experiments prove significant increase in performance • Datagrid Information Lifecycle Management • Autonomic: “Move data at 9: 00 PM in weekdays and in week ends” • Data Grid Administration • Power-users and Sophisticated Users • Data Grid Administrator (Rules to manage data grid) • Scientist or Librarian (Visualized data flow programming) 7 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Talk Outline • • • Introduction to Gridflows Introduction to SDSC Matrix Project Data Grid Language Architecture of SDSC Matrix Usage • What can you do for Matrix? 8 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
What they want? We know the business (scientific) process Cyber. Infrastructure is all we care (why bother about atoms or DNA) 9 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
What they want? Use DGL to describe your process logic with abstract references to datagrid infrastructure dependencies 10 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Why a Gridflow Language? • Infrastructure independent description • Abstract references to hardware and cyberinfrastructure • Description of execution flow logic • Separate the execution flow logic from application logic • (e. g) Monte. Carlo is an application, execution of that 10 times or till a variable becomes zero is execution logic • Procedural Rules associated with execution flow • Provenance • What happened, when, who, how …? (and querying) 11 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflow Language Requirements • High level Abstract descriptions • Abstract description of cyberinfrastructure dependencies • Simple yet flexible • Flexible to describe complex requirements (no brute force) • Gridflow dependency patterns • Based on execution structure and data semantics • (Parallel, Sequential, fork-new), (milestones, for-each, switch-case). . • Asynchronous execution • For long-run requests • Querying using existing standard • XQuery 12 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflow Language Requirements • Process meta data and annotations • Runtime definition, update and querying of meta-data • Runtime Management of Gridflows • Stop gridflow at run time • Partitioning • Facility in language to divide a gridflow request to multiple requests (Excellent Research Topic) • Import descriptions • Refer other gridflows in execution 13 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Data Grid Language (DGL) • XML based gridflow description • Describes execution flow logic • ECA-based rule description for execution • ECA = Event, Condition, Action • Querying of Status of Gridflow • XQuery / Simple query of a Gridflow Execution • Scoped variables and gridflow patterns • For control of execution flow logic 14 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
DGL Requests • Data Grid Flow • An XML Structure that describes the execution logic, associated procedural rules and grid environment variables • Status Query • An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level • A DGL or Matrix client sends any of these to the Matrix Server 15 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Data Grid Request Annotations about the Data Grid Request Can be either a Flow or a Status Query 16 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Grid User <Grid. User> <user. ID>Matrix-demo</user. ID> <organization. Name>sdsc</organization. Name> </organization> <challenge-Response>******</challenge-Response> <home. Directory>/home/Matrixdemo. sdsc</home. Directory> <default. Storage. Resource>sdscunix</default. Storage. Resource> <phone. Number>0</phone. Number> <e-mail>arun@sdsc. edu</e-mail> </Grid. User> 17 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Grid Ticket 18 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
VO Info 19 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements 20 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Talk Outline • • • Introduction to Gridflows Introduction to Matrix Data Grid Language Architecture of SDSC Matrix Usage • What can you do for Matrix? 21 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Matrix Gridflow Server Architecture JAXM Wrapper WSDL Description SOAP Service for Matrix Clients Event Publish Subscribe, Notification JMS Messaging Interface Matrix Data Grid Request Processor Sangam P 2 P Gridflow Broker and Protocols Transaction Handler Flow Handler and Execution Manager Status Query Handler XQuery Processor ECA rules Handler Matrix Agent Abstraction SDSC SRB Agents Other SDSC Data Services Agents for java, WSDL and other grid executables Workflow Query Processor Gridflow Meta data Manager Persistence (Store) Abstraction JDBC In Memory Store 22 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Talk Outline • • • Introduction to Gridflows Introduction to Matrix Data Grid Language Architecture of SDSC Matrix Usage • What can you do for Matrix? 23 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Using XML-Editor • Only XML (DGL) file required • All that is needed is a DGL file that has to be sent to the server • Use XML Editor to make DGL file • XMLSpy® could be used • Send it to the Matrix Server • Use the Java Program DGLSender. java 24 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Using Java API • Download our Matrix Java Client • Programmatically create a request • Use it in your java program to interact with the grid and develop a local application • http: //www. npaci. edu/DICE/SRB/matrix/Software/ index. html 25 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Using WSDL • Use the WSDL to create a SOAP based client in any programming language or your preference 26 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Using DG-Modeler • GUI for dataflow programming 27 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflow Process I End User using DGBuilder Gridflow Description Data Grid Language 28 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflow Process II Abstract Gridflow using Data Grid Language Planner Concrete Gridflow Using Data Grid Language 29 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
Gridflow Process III Gridflow Processor Concrete Gridflow Using Data Grid Language Gridflow P 2 P Network 30 San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
got ideas/suggestions? Contact: SDSC Matrix project arun@sdsc. edu Google key word: SDSC Gridflow Click here to start the slide show again San Diego Supercomputer Center University of Florida Grid Physics Network (Gri. Phy. N)
- Slides: 31