Part 2 Architecture overview Professor Carole Goble University
Part 2: Architecture overview Professor Carole Goble University of Manchester http: //www. mygrid. org. uk GGF Summer School 24 th July 2004, Italy
In a nutshell • Bioinformatics toolkit • Open (Web) Services – my. Grid components and external domain services – Publication, discovery, interoperation, composition, decommissioning of my. Grid services – No control or influence over domain service providers • Metadata Driven – LSIDs, Common information model, Ontologies, Semantic Web technologies • Open extensible architecture – Assemble your own components – Summer Designed work together GGF Schoolto 24 th July 2004, Italy – Loosely coupled Semantic Discovery Feta View UDDI registry Pedro Taverna Wf. DE Freefluo Event Notification Wf. EE Info. Model Soaplab Gowlab Haystack Provenance Browser Gateway & CHEF Portal LSID m. IR
Key Characteristics • Data Intensive, Up stream analysis • Pipelines - experiments as workflows (chiefly) • Adhoc exploratory investigative workflows for individuals from no particular a priori community • Openness – the services are not ours. • Low activation energy, incremental take-on • Foundations for sharing knowledge and sharing experimental objects • Multiple stakeholders • Collection of components for assembly GGF Summer School 24 th July 2004, Italy
Openness • Openness – open source – open world of services – open extensible technology – open to wider e. Science context – open to user feedback – open to third party metadata GGF Summer School 24 th July 2004, Italy
Platform • Standards based • (Web) Service Oriented Architecture – Publication, discovery, interoperation, composition, decommissioning of my. Grid services – Web services communication fabric – XML document types – LSIDs for identifying resources • Implemented in Java using Axis and Tomcat – WS-I -> OGSA / WSRF • Metadata driven – RDF-coded metadata – OWL-coded ontologies – Common model GGF Summer School 24 thinformation July 2004, Italy
Stakeholders • • • Middleware for my. Grid users Tool Developers IS specialists biologists Bioinformaticians systems tool Service Providers infrequent builders administrators problem Biologists are specific bioinformaticians service indirectly supported provider by the portals and bioinformatics tool builders apps these develop. annotators GGF Summer School 24 th July 2004, Italy
Collections of Tasks Domain Tasks Building Service Providers Workflow Enactment Bioinformaticians Scientists Data Management Storage Description Provenance Finding Service Discovery Querying GGF Summer School 24 th July 2004, Italy Annotation providers
Experimental entities GGF Summer School 24 th July 2004, Italy
Investigation = set of experiments + metadata • Experimental design components • Experimental instances that are records of enacted experiments • Experimental glue that groups and links design and instance components • Life Science IDs, URIs, RDF GGF Summer School 24 th July 2004, Italy
Tool Providers Web Portal LSID Launch pad e-Science Mediator UDDI Registries Feta Service & WF Discovery Ontologies Ontology Mgt Views Haystack Metadata Store Free. Fluo Workflow Enactment Engine Provenance Mgt LSID Authority Event Notification Service Information Repository OGSA-DQP Distributed Query Processor Soap. Lab Gow. Lab GGF Summer School Italyapps Legacy apps 24 th July 2004, Native Web Services AMBIT Text Extraction Service External services Web Service (Grid Service) communication fabric Core services Service Providers Taverna Workbench Service Stack Applications Bioinformaticians my. Grid
Apps Service stack LSID Launch Pad Taverna workbench Haystack Web Portal e-Science process patterns External services Service & workflow discovery ! Metadata management ! Data management ! Workflow enactment ! e-Science event bus Core services e-Science Mediator Web Service (Grid Service) communication fabric Soap. Lab Gow. Lab GGF Summer School 24 th July 2004, Italy Legacy apps Websites Native Web Services AMBIT Text Extraction Service
20, 000 feet Provenance and Data browser Haystack or Portal Taverna Workbench Semantic Discovery & Registration View Service LSID Authority UDDI m. IR data m. IR metadata Store Service Event Notification Service GGF Summer School 24 th July 2004, Italy Freefluo Workflow Engine Web services, local tools User interaction etc.
e-Science Mediator 1. Application-oriented: directly supports the e. Scientist by: • providing pre-configured e-Science processes templates (i. e. system -level workflows) • helping in capturing and maintaining context information (via the information model) that is relevant to the interpretation and sharing of the results of the e-science experiments. • Facilitating personalisation and collaboration 2. Middleware-oriented: contributes to the synergy between my. Grid services by: • Acting as a sink for e-Science events initiated by my. Grid components • Interpreting the intercepted events and triggering interactions with other related components entailed by the semantics of those events • Compensating for possible impedance mismatches with other services both in terms of data types and interaction protocols GGF Summer School 24 th July 2004, Italy
Supporting the e-scientist Find Workflow Use-case • Recurring use-cases can be captured • Then corresponding process templates can be authored • e-science mediator makes processes available to the user GGF Summer School 24 th July 2004, Italy Find an interesting workflow for experiment Examine and modify if necessary Find Workflow Process Create exp. Context for this user launch semantic Search facility Launch workflow Editor for selected WF Store to personal repository For later re-use Enable MIR browser For storage with context
• E-Science process templates maintained by the mediator can derive the GUI generation and interaction with the user GUI E-Science Mediator GGF Summer School 24 th July 2004, Italy
Mediating between services Example: mediation during a workflow execution 2: Establish experiment/user context [*]4: link process trace to context 7: get WF results 1: Execution started [*]3: intermediate process completed 6: workflow completed E-Science Mediator 9: notify WF completion to subscribers Notification Service [*]5: Store intermediate process trace 8: Store WF results MIR GGF Summer School 24 th July 2004, Italy WF Enactor
Simplified Architecture Context preserved via my. Grid Inormation Model Client Side The Grid GUI (e-science workbench) Client-side e-science process logic Server-side e-science process logic Service Registry GGF Summer School 24 th July 2004, Italy E-Science Mediator client-stubs E-Science Mediator Service Notification Service MIR WF Enactor
Event notification Service • Publish/subscribe model – Topic based (cf. JMS topics, CORBA channels) – Hierarchic topics – Persistent event storage – Subscription leases – Federation for scalability & reliability – Event filtering GGF Summer School 24 th July 2004, Italy http: //cvs. mygrid. org. uk/notification-stable/downloads
Portal toolkit for bioinformaticians • Target application – Williams-Beuren Syndrome – Fixed set of workflows • Extra my. Grid portlets – – – Configurable Workflow enactment Workflow scheduling Completion notification Results browsing • Based on CHEF & Jetspeed-1 Portlet Container Interface Portlet – Portlets for team collaboration GGF Summer School 24 th July 2004, Italy
Text Services User Client XScufl workflow definition + parameters Workflow Server Clustered Pub. Med Ids + titles Initial Cluster Workflow Abstracts Workflow Swissprot/Blast Enactment record Extract Get Related Pub. Med Id Abstracts Term-annotated Medline abstracts Medline Server (Sheffield) Pub. Med Ids Medline: pre-processed offline to extract biomedical terms + indexed GGF Summer School 24 th July 2004, Italy Pub. Med Ids Get Medline Abstracts
Pre-Prototype Experimental Web-based Requirements gathering Prototype 1 Architectural workout All services represented Net. Beans workbench API-based integration Info Repository oriented XML-based process provenance Workflow enactment engine Prototype 2 Second generation services Reworked information model Open information management Life Science Identifiers RDF based provenance Taverna workbench GGF Summer School 24 th July 2004, Italy Web-based portal History Demo at ISMB 2003 Full paper and demo at ISMB 2004 GSK deployment Real biology
Two+ Paths Innovative work Core functionality • Services – Soaplab • Service and workflow and Gowlab registration • Workflow enactment • Semantic discovery engine – Freefluo • Provenance • Workflow workbench – Taverna management • Data integration – • Text mining OGSA-DQP • Information model & management • Mediator In between • Event notification GGF Summer School 24 th July 2004, Italy
my. Grid People Core • Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pokock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users • Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK • Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK • Steve Kemp, Liverpool, UK Postgraduates • Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) • Robin Mc. Entire (GSK) Collaborators • Keith Decker GGF Summer School 24 th July 2004, Italy
Collaboration GGF Summer School 24 th July 2004, Italy http: //www. accessgrid. org
Publications • • R. Stevens, H. J. Tipney, C. Wroe, T. Oinn, M. Senger, P. Lord, C. A. Goble, A. Brass and M. Tassabehji Exploring Williams-Beuren Syndrome Using my. Grid to appear in Proceedings of 12 th International Conference on Intelligent Systems in Molecular Biology, 31 st Jul-4 th Aug 2004, Glasgow, UK. C. A. Goble, S. Pettifer, R. Stevens and C. Greenhalgh Knowledge Integration: In silico Experiments in Bioinformatics in The Grid: Blueprint for a New Computing Infrastructure Second Edition eds. Ian Foster and Carl Kesselman, 2003, Morgan Kaufman, November 2003. R. Stevens, A. Robinson, and C. A. Goble my. Grid: Personalised Bioinformatics on the Information Grid in proceedings of 11 th International Conference on Intelligent Systems in Molecular Biology, 29 th June– 3 rd July 2003, Brisbane, Australia, published Bioinformatics Vol. 19 Suppl. 1 2003, pp 302 -304. GGF Summer School 24 th July 2004, Italy
http: //www. mygrid. org. uk GGF Summer School 24 th July 2004, Italy
- Slides: 26