Designing Executing and Sharing Workflows with Taverna 2

  • Slides: 50
Download presentation
Designing, Executing and Sharing Workflows with Taverna 2. 2 Katy Wolstencroft my. Grid University

Designing, Executing and Sharing Workflows with Taverna 2. 2 Katy Wolstencroft my. Grid University of Manchester

Exercise 1: Exploring the Workbench Taverna can be downloaded from http: //www. taverna. org.

Exercise 1: Exploring the Workbench Taverna can be downloaded from http: //www. taverna. org. uk/ Go to the page and click on download Taverna 2. 2. 0 Download the correct version for your operating system Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench

Taverna Workbench Services Panel Workflow Explorer Workflow Diagram

Taverna Workbench Services Panel Workflow Explorer Workflow Diagram

1. Workflow Explorer The Workflow Explorer is the primary editing component within Taverna. Through

1. Workflow Explorer The Workflow Explorer is the primary editing component within Taverna. Through it you can load, save and edit any property of a workflow. Details of workflow validation can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available The workflow explorer is also where you find configuration details of services and advanced options like iteration and looping. We will come back to these things later

1. Workflow Diagram The visual representation of workflow Shows inputs / outputs, services and

1. Workflow Diagram The visual representation of workflow Shows inputs / outputs, services and control flows Allows editing of the workflow by dragging and dropping and connecting services together Enables saving of workflow diagrams for publishing and sharing

1. Available Services Panel Lists services available by default in Taverna Local java services

1. Available Services Panel Lists services available by default in Taverna Local java services Simple web services Soaplab services – legacy command-line application R Processor Bio. Mart database services Bio. Moby services Beanshell processor Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

Exercise 2: Adding New Services � New services can be gathered from anywhere on

Exercise 2: Adding New Services � New services can be gathered from anywhere on the web � We will find a new service and add it to the workbench � You can find more services in the Bio. Catalogue � The Bio. Catalogue is a public curated catalogue of Life Science web services from Manchester and the EBI

2: Adding New Services Go to: http: //www. biocatalogue. org and explore. Through the

2: Adding New Services Go to: http: //www. biocatalogue. org and explore. Through the Bio. Catalogue you can find, register, or annotate web services

2. Adding New Services q q Type ‘blast’ into the Search box in the

2. Adding New Services q q Type ‘blast’ into the Search box in the Bio. Catalogue Select the Blast service from the DDBJ (Hint – it is from Japan) There it is!

2. Adding New Services q. Clicking on the blast service brings you to the

2. Adding New Services q. Clicking on the blast service brings you to the page describing the service and its operations q. Copy the service WSDL location This is what Taverna needs…

2. Adding New Services Go to the services panel in Taverna and click “import

2. Adding New Services Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service Select ‘WSDL service…’ A window will pop-up asking for a web address

2. Adding New Services Enter the Blast Web service address you just copied Scroll

2. Adding New Services Enter the Blast Web service address you just copied Scroll down to the bottom of the Services list and look at the new DDBJ service that is now included.

Exercise 3: Building a Simple Workflow Go to the Services Panel Type ‘Fasta’ into

Exercise 3: Building a Simple Workflow Go to the Services Panel Type ‘Fasta’ into the ‘search’ box at the top of the panel You will see several services in the search results � Select ‘Get Protein FASTA’. This service returns a protein sequence in Fasta format from a database if you supply it with a sequence id Drag this service across to the workflow explorer panel �

Exercise 3: Building a Simple Workflow In a blank space in the workflow diagram,

Exercise 3: Building a Simple Workflow In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port” Type in a name for this input (e. g. ID) and click “ok” Do the same to create a new workflow output. Call this output “sequence”

Exercise 3: Building a Simple Workflow You now have 3 boxes in the diagram

Exercise 3: Building a Simple Workflow You now have 3 boxes in the diagram and we need to connect them up Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes

Exercise 3: Building a Simple Workflow Click on the output box, drag towards “Get

Exercise 3: Building a Simple Workflow Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes You have now built your first workflow! It should look something like this

Exercise 3: Building a Simple Workflow Run the workflow by selecting “file -> run

Exercise 3: Building a Simple Workflow Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

Exercise 3: Building a Simple Workflow An input window will appear. As you can

Exercise 3: Building a Simple Workflow An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a Genbank Gene identifier (e. g. 215422388) where it says “some input data goes here”

Exercise 3: Building a Simple Workflow Click “run workflow” In the bottom left of

Exercise 3: Building a Simple Workflow Click “run workflow” In the bottom left of the results window, click on the results. You will now see a protein sequence from genbank In the services panel, search for “blast” Find the result “Search. Simple – Execute Blast” and drag that across to the workflow panel (this is the service we added at the beginning)

Exercise 3: Building a Simple Workflow Now we have 2 services to connect into

Exercise 3: Building a Simple Workflow Now we have 2 services to connect into a workflow. We will connect “Get_protein_fasta” to “Search. Simple” by rightclicking “Get_protein_fasta” and selecting “link from output_text” You will get an arrow. Drag the arrow to “search. Simple”. A box will appear asking which port you want to connect to – select “query”. Now the services are connected

3: Building a Simple Workflow If you show the service ports, you can connect

3: Building a Simple Workflow If you show the service ports, you can connect directly between an output port on one service and an input port on another Show the service ports by clicking on the blue square icon at the top of the workflow diagram (next to abc)

Exercise 3: Building a Simple Workflow We need to finish building the workflow by

Exercise 3: Building a Simple Workflow We need to finish building the workflow by adding inputs and outputs Right click on “Search. Simple -> Result” and select “connect as input to. . New Workflow Output Port”

Exercise 3: Building a Simple Workflow Taverna will suggest a name for the output,

Exercise 3: Building a Simple Workflow Taverna will suggest a name for the output, if this is ok, select “ok” Add two new workflow inputs (called ‘database’ and ‘program’) and connect these to ‘database’ and ‘program’ in Search. Simple

3: The Finished Workflow! Your workflow should look something like this

3: The Finished Workflow! Your workflow should look something like this

3. Validate your Workflow Taverna can check to see that everything is connected properly

3. Validate your Workflow Taverna can check to see that everything is connected properly and that all the services in your workflow are available Go to the workflow explorer and click on ‘validation report’ See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run. If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab

3: Adding a Workflow Description Right-click on a blank part of the workflow diagram

3: Adding a Workflow Description Right-click on a blank part of the workflow diagram and select “show details” In the workflow explorer panel, the details page will open up. Add some details about the workflow e. g. who is the author, what does it do You can also add examples and descriptions for the workflow inputs by selecting them and selecting “details” An example for database is ‘SWISS’, for program, ‘blastp’, and for ID ‘ 215422388’ Save the workflow by going to “File -> save workflow”

4. Running the Workflow Go to “File -> run workflow”. A workflow input window

4. Running the Workflow Go to “File -> run workflow”. A workflow input window will appear like before This time, each input has its own tab with descriptions and examples as well as a panel to enter data In the fasta_id input, select “New value” and add a genbank GI number (e. g. 215422388) In the database, add “SWISS” In the program, add “blastp” Select “run workflow” at the bottom of the panel to set the workflow going

4. Setting String Constants For parameters that do not change often, you will not

4. Setting String Constants For parameters that do not change often, you will not wish to always type them in as input. In this example, the database and blast program may only change occasionally, so there is an alternative way of defining them. Go back to the workflow diagram and remove the ‘database’ and ‘program’ inputs by right-clicking and selecting ‘Delete workflow input port’

4. Setting String Constants In a blank space in the workflow diagram, right-click and

4. Setting String Constants In a blank space in the workflow diagram, right-click and select ‘string constant’ In the pop-up box add ‘SWISS’ as a value and change the name of the string constant to database Connect this to the database port on the BLAST service Create another string constant with a value ‘blastp’ and the name ‘program’ Connect this to the program port on the BLAST service Save the workflow and run it again – this time you will only be asked for one input

4. Checkpoint Exercise Now modify your workflow so that BLAST searches across all protein

4. Checkpoint Exercise Now modify your workflow so that BLAST searches across all protein databases and you only get back the top 5 hits in a tabular format HINT: you will need to swap Search. Simple for another service from the same set.

Exercise 5: Sharing Workflows Go to http: //www. myexperiment. org my. Experiment is a

Exercise 5: Sharing Workflows Go to http: //www. myexperiment. org my. Experiment is a social networking site for sharing workflows and workflow expertise and experiences Browse around the site and see what it contains Create yourself an account and join the group called Bonn (This is a place where you can find many resources for this week’s exercises)

5. Sharing workflows Explore my. Experiment Which is the most downloaded workflow? Which is

5. Sharing workflows Explore my. Experiment Which is the most downloaded workflow? Which is the most viewed workflow? Is it the same? Explore the workflows packs – how many packs feature workflows for microarray analysis? Find all the items relating to Systems Biology. How did you find them? How many are there? Can all the workflows be downloaded?

6. Using Workflows from my. Experiment You can download and run the workflows from

6. Using Workflows from my. Experiment You can download and run the workflows from the my. Experiment website, or you can use my. Experiment directly from Taverna Go back to Taverna and click on the my. Experiment icon at the top of the workbench In the search box, type ‘Kegg’. We are going to find all the workflows that explore kegg pathways In the results, find the workflow called “NCBI GI to Kegg Pathways” (by Paul Fisher)

6. Using Workflows from my. Experiment We will add this workflow to our own

6. Using Workflows from my. Experiment We will add this workflow to our own blast workflow by clicking ‘import’ and selecting ‘Add as nested workflow’ in the pop-up window. NOTE: If you add a workflow as a nested workflow, it continues to be a separate module (a workflow within a workflow). We recommend this modular approach because it is easier to combine and reuse these functional models. You need to connect up the workflow as if it was any other kind of service

7. Reusing and connecting Workflows The nested workflow has 1 input and 4 outputs

7. Reusing and connecting Workflows The nested workflow has 1 input and 4 outputs Connect the outer workflow input ‘ID’ to the nested workflow input

7. Reusing and connecting Workflows Create 2 new outputs (by right-clicking on the blank

7. Reusing and connecting Workflows Create 2 new outputs (by right-clicking on the blank canvas) and call them ‘pathways’ and ‘pathway_descriptions’ Connect the nested workflow output ‘pathway_by_gene’ to the ‘pathways’ output and connect ‘pathway_descriptions’ to ‘pathway_descriptions’

7. Reusing and connecting Workflows Save the workflow and run it As the workflow

7. Reusing and connecting Workflows Save the workflow and run it As the workflow runs, track its progress by looking at the graphical view and the progress report in the results panel. As services finish, they turn grey. You can pause and resume the workflow if you wish (this is more useful with longer running workflows!) Look at the results This time, you will have blast results and kegg pathway results

7. Looking at Intermediate Results You can also track intermediate workflow values through the

7. Looking at Intermediate Results You can also track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from. On the diagram, click the service called ‘btit’ and look at its inputs and outputs in the results. This gives you the gene names plus a short description You can save the workflow back onto my. Experiment if you wish, but make sure you give credit to the nested workflow author! We will come back to combining workflows later

Controlling data flow in Workflows Taverna allows you to automatically iterate through large data

Controlling data flow in Workflows Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options, such as setting iteration strategies and adding loops to your workflows

8. Iteration � � As you have already seen, Taverna can automatically iterate over

8. Iteration � � As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line

8. Iteration Find and load the workflow ‘Demonstration of configurable iteration’ from my. Experiment

8. Iteration Find and load the workflow ‘Demonstration of configurable iteration’ from my. Experiment Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) Select the ‘Colour. Animals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product

8. Iteration Run the workflow twice – once with ‘dot product’ and once with

8. Iteration Run the workflow twice – once with ‘dot product’ and once with ‘cross product’. Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?

9. Looping From the Bonn group in my. Experiment, load the workflow ‘Interpro. Scan_Example’

9. Looping From the Bonn group in my. Experiment, load the workflow ‘Interpro. Scan_Example’ by Katy Wolstencroft This workflow is asynchronous. This means that when you submit data to the ‘run. Interpro. Scan’ service, it will return a job. ID and place your job in a queue (this is very useful if your job will take a long time!) The ‘Status’ nested workflow will query your job ID to find out if it is complete

9. Looping The default behaviour in a workflow is to call each service only

9. Looping The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? Run the workflow Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service

9. Looping This is where looping is useful. Taverna can keep running the ‘status’

9. Looping This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING

9. Looping Save the workflow and run it again This time, the workflow will

9. Looping Save the workflow and run it again This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR. You will see results for ‘Text. Results’, but you will still get an error for ‘Graphical_results’. This is because there is one more configuration to change – we also need ‘Control Links’

10. Control Links A control link specifies that there is a dependency of one

10. Control Links A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’

10. Control Links We will add control links to the other two output types

10. Control Links We will add control links to the other two output types Right-click on get. Result_graphical_input and select ‘Run after’ from the drop down menu. Set it to ‘Run after’ -> ‘Status’ Save and run the workflow Now you will see each result returned

11. Retries: Making your Workflow Robust Web services can sometimes fail due to network

11. Retries: Making your Workflow Robust Web services can sometimes fail due to network connectivity If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow Upload the ‘Retry-Example’ workflow from the my. Experiment Bonn group. This workflow is designed to fail sometimes. Run the workflow as it is and count the number of failed iterations

11. Retries: Making your Workflow Robust Now, select the ‘sometimes_fails’ service and select the

11. Retries: Making your Workflow Robust Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each service iteration 2 times Run the workflow again – how many failures do you get this time? Change the workflow to retry 5 times – does it work every time now?