The 10 Best Practices for Workflow Design Bio

The 10 Best Practices for Workflow Design Bio. Ve. L M 6 Workshop Göteborg, May 10 -11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (my. Grid) Thanks: Bio. Semantics Group (LUMC), my. Grid team (Uo. M), Yassene Mohamed, Harish Dharuri (LUMC)

Our specialty: Knowledge Discovery http: //biosemantics. org Disambiguation* Text Mining Substrates for Knowledge Discovery Methods for Knowledge Discovery Applications • Predict protein-protein, protein-disease associations, gene prioritization • Genotype-phenotype studies, e. g. Huntington’s Disease, Metabolic Syndrome • Yours? * Global disambiguation initiative: http: //snipurl. com/conceptweballiance 2

Introduction Why build good workflows? Good workflow design = good science! 3

Introduction Best practices for workflow design Best Practices for workflow design = Best Practices experimental science + Best Practices software engineering 4

1 Make a sketch workflow 5

Best practice 1 Sketch an Abstract Workflow Powerpoint courtersy of Eleni Mina 6

2 Use modules 7

http: //www. myexperiment. org/workflows/74. html 8

3 Think about the output (and the data in your workflow in general) 9

Best practice 3 Think about the output ? http: //. . . 10

4 Provide example inputs and outputs 11

Taverna 2. 3 Recipe Taverna 2. 4 Select input/output Right-click input/output Select tab ‘Details’ Select ‘Annotation’ Click ‘Annotation’ Add Example 12

5 Annotate 13

Best practice 5 Each component in Taverna can be annotated Annotate 14

Best practice 5 Annotate and help your users 15

6 Make workflow executable from outside the local environment 16

Best practice 6 Make workflow executable by others How to check that others can execute your workflow? » Try it! Proof of executability › Ask a colleague › Use an external t 2 web runner » Tips › Use Web Services › If you use local command line tools • Install tools on a publicly accessible server (e. g. applies to Rserve) • Use system that your users can set up (e. g. Bio. Linux) 17

7 Choose services carefully 18

Best practice 7 Choose services carefully 19

Best practice 7 Choose services carefully 20

8 Reuse existing workflows 21

Best practice 8 The reuse workflow Check workflows on my. Experiment Neg. Contact authors Retry Pos. Check services on Bio. Catalogue Not a best practice, but a tip: know-how is important for reuse Neg. Contact authors Retry Use scripts from colleagues Search the internet Invent a new wheel Pos. Reuse, Attribute Respect licences 22

9 Advertise 23

Advertise Unique reference for in your papers and for others to cite 24

10 Maintain 25

Best Practice 10 Maintain Best practices to support maintenance » Regularly check your workflow › Ask colleagues » Enable support for maintenance › Register your workflow on my. Experiment › Register Web Services on » Enable peers to repair: annotate! » Note about versioning › No need to register all edits on my. Experiment: use subversion › Register important updates on my. Experiment 26

Bonus tip Use common sense as scientist 27

Workflow Forever Preservation of good workflows for future applications Workflow 74 “Protein Discovery” 2005 Workflow 2876 “Match gene lists by literature” 2012 Workflow 2805 “Get Pathway genes” 2012 28

Wf 4 Ever Outcomes for Bio. Ve. L my. Experiment 2. 0 Bio. Catalogue Taverna Research Objects Linked Data Methods Protocols for Preservation and Conservation 29

The 10 Best Practices of Workflow Design Thank you for your attention More information: http: //snipurl. com/workflowbestpractices 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Make a sketch workflow Use modules Think about the output Provide example inputs and outputs Annotate Make it executable from outside the local environment Choose services carefully Reuse existing workflows Advertise Maintain 30
- Slides: 30