Controlling data flow in Workflows Taverna allows you

  • Slides: 26
Download presentation
Controlling data flow in Workflows Taverna allows you to automatically iterate through large data

Controlling data flow in Workflows Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for the Taverna engine, such as, setting iteration strategies and adding loops to your workflows

8. Iteration � � As you have already seen, Taverna can automatically iterate over

8. Iteration � � As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line

8. Iteration Find and load the workflow ‘Demonstration of configurable iteration’ from my. Experiment

8. Iteration Find and load the workflow ‘Demonstration of configurable iteration’ from my. Experiment Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) Select the ‘Colour. Animals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product

8. Iteration Run the workflow twice – once with ‘dot product’ and once with

8. Iteration Run the workflow twice – once with ‘dot product’ and once with ‘cross product’. Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?

9. Looping From my. Experiment, find the workflow ‘Interpro. Scan_Example’ by Katy Wolstencroft This

9. Looping From my. Experiment, find the workflow ‘Interpro. Scan_Example’ by Katy Wolstencroft This workflow is asynchronous. This means that when you submit data to the ‘run. Interpro. Scan’ service, it will return a job. ID and place your job in a queue (this is very useful if your job will take a long time!) The ‘Status’ nested workflow will query your job ID to find out if it is complete

9. Looping The default behaviour in a workflow is to call each service only

9. Looping The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? Run the workflow Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service

9. Looping This is where looping is useful. Taverna can keep running the ‘status’

9. Looping This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING

9. Looping Save the workflow and run it again This time, the workflow will

9. Looping Save the workflow and run it again This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR. You will see results for ‘Text. Results’, but you will still get an error for ‘Graphical_results’. This is because there is one more configuration to change – we also need ‘Control Links’

10. Control Links A control link specifies that there is a dependency of one

10. Control Links A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’)

10. Control Links We will add control links to the other two output types

10. Control Links We will add control links to the other two output types Right-click on get. Result_graphical_input and select ‘Run after’ from the drop down menu. Set it to ‘Run after’ -> ‘Status’ Save and run the workflow Now you will see each result returned

11. Retries: Making your Workflow Robust Web services can sometimes fail due to network

11. Retries: Making your Workflow Robust Web services can sometimes fail due to network connectivity If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow Upload the ‘Retry-Example’ workflow from the my. Experiment Next Generation Sequencing Tutorial group. This workflow is designed to fail sometimes. Run the workflow as it is and count the number of failed iterations

11. Retries: Making your Workflow Robust Now, select the ‘sometimes_fails’ service and select the

11. Retries: Making your Workflow Robust Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each service iteration 2 times Run the workflow again – how many failures do you get this time? Change the workflow to retry 5 times – does it work every time now?

Exploring Shims A shim is a service that doesn’t perform an experimental function, but

Exploring Shims A shim is a service that doesn’t perform an experimental function, but acts as a connector, or glue, when 2 experimental services have incompatible outputs and inputs A shim can be any type of service – WSDL, soaplab etc. Many are simple Beanshell scripts We have already used many shims in these exercises http: //en. wikipedia. org/wiki/Shim (for the origin of the word)

Writing your own shim services Many shims are actually Beanshell scripts allow you to

Writing your own shim services Many shims are actually Beanshell scripts allow you to add simple data transformation steps into your workflow in an easy way. The next exercise will give you a brief introduction to writing Beanshells and give you the most common example of when they are used.

Writing your Own Beanshell q q q Create a new workflow by selecting ‘file’

Writing your Own Beanshell q q q Create a new workflow by selecting ‘file’ and ‘New Workflow’ Search for ‘beanshell’ in the service panel and drag in a beanshell service. A configure window will pop-up In the beanshell configuration window, create 2 input ports named: my. Name and my. Surname after selecting the ‘Ports’ tab Create 1 output port named: my. Fullname

Writing your Own Beanshell Select the script tab and Paste the following script my.

Writing your Own Beanshell Select the script tab and Paste the following script my. Fullname = my. Name +"t" + my. Surname Save and Apply the Beanshell script q In the workflow diagram, create 2 workflow inputs and 1 workflow output and connect them to the configured beanshell service. q Run the workflow q You should get your full name printed in the output. This is a very simple example of using helper services (shims) to format results from your workflow. You would use the same type of script to combine biomart outputs, for example

REST Services Data and tools on the Web have been exposed in a RESTful

REST Services Data and tools on the Web have been exposed in a RESTful manner. Taverna provides a custom processor for accessing such services.

REST stands for REpresentational State Transfer Web services with this type of interface typically

REST stands for REpresentational State Transfer Web services with this type of interface typically expose some of the following four types of operations: GET - to get a resource POST - to make a new resource or to perform a request (such as search) PUT - to update a resource DELETE - to delete a resource

Adding a REST web service Expand the Service templates folder under Available services in

Adding a REST web service Expand the Service templates folder under Available services in the Service Panel. Select the REST service template and drag n drop it into the Workflow Diagram

Configuring a REST Service In a dialog box that pops up, configure the use

Configuring a REST Service In a dialog box that pops up, configure the use of a REST service by adding the: URL of the service, e. g. http: //www. uniprot. org/uniprot/{id} Type of the operation you want to perform on the service, e. g. GET The expected MIME data type as returned by the services. Select text/plain in this case.

Configuring a REST Service The URL of the service you enter is actually a

Configuring a REST Service The URL of the service you enter is actually a template that can take configurable parameters In the current example, the name of the parameter is ‘id’ which is enclosed within a pair of braces Parameters are used when you do not know their value in advance, i. e. at the time of adding the service to the workflow diagram, and they depend on some of the previous services in the workflow. The value in braces will then be replaced with the actual value when the workflow is executed.

Building a simple workflow using a REST service To complete the building of a

Building a simple workflow using a REST service To complete the building of a simple workflow using the Uniprot REST web service, add: a workflow input for the REST service input port named ‘id’ workflow outputs for the response. Body and status output ports of the REST service activity The workflow should now look as follows:

Building a simple workflow using a REST service Now try running the workflow using

Building a simple workflow using a REST service Now try running the workflow using an example Uniprot identifier such as Q 99102. fasta Note that the presence of the. fasta suffix enables the protein sequence to be returned as text What happens when you don’t use the. fasta suffix? Try using other suffixes, e. g. xml, txt, rdf and gff Further information about the Uniprot REST service is available from http: //www. uniprot. org/faq/28

Advanced configuration of REST services

Advanced configuration of REST services

Advanced configuration of REST services HTTP Expect request-header This field option allows Taverna to

Advanced configuration of REST services HTTP Expect request-header This field option allows Taverna to set a special "Expect" header when sending a request to the REST service. Client requests using the POST method will expect to receive a 100 Continue or Redirect response from the service to indicate that the client should proceed to send the POST data. This mechanism allows clients to avoid sending large amounts of data over the network twice when the service could reject or redirect the request. Selecting this option may significantly improve performance when large volumes of data are to be sent to the service and authentication or a redirect from the original URL to the one specified by the service is likely.

Advanced configuration of REST services Redirection output port The Show "Redirection" output port option

Advanced configuration of REST services Redirection output port The Show "Redirection" output port option makes the service's redirection output port visible as this output port is hidden by default. The port will contain the URL of the final redirect that has yielded the output data on the response. Body port.