Programming Languages for Biology BorYuh Evan Chang November
Programming Languages for Biology Bor-Yuh Evan Chang November 25, 2003 OSQ Group Meeting
Biological Perspective F FF F [http: //www. nocturnalvisions. freeservers. com/page 6. html] FF [Matsudaira et al. Molecular Cell Biology 4. 0. Freeman, 2000] 11/25/2003 FF FF 2
Traditional Biological Research • Experiments must focus on a small, specific piece of a system – isolate the variable – feasibility • Have led to an enormous wealth of (detailed) knowledge but in a fragmented form Virus Expert 11/25/2003 Cell Receptor Expert 3
Systems Biology • Emerging area of biology – study of the relationships and interactions between biological components – many thousand of molecules interact in complex series of reactions to perform some function (called a pathway) • e. g. , lactose interacting with a receptor triggers a series of actions to create the enzyme capable of breaking it down into usable form – “pathways” may overlap 11/25/2003 4
Approaching Systems Biology • Need a common language of describing/modeling all components of a system – must be modular, compositional, and provided varying levels of abstraction • Abstraction is an absolute necessity – 1 ribosome (eukaryotic) ¼ 82 proteins + r. RNA • 1 protein ¼ hundreds/thousands amino acids – 1 membrane ¼ thousands of molecules (lipids, proteins, carbohydrates) 11/25/2003 5
The Biologist’s View • How do biologists think about or view biological entities (e. g. , proteins)? – an entity can interact with certain other types of entities – an entity can be in a certain “state” – interaction causes some action or state change • Analogous to a system of thousands of concurrent computational processes – Walter Fontana, a theoretical biologist, examined -calculus and linear logic for describing biological systems (¼ 1995). 11/25/2003 6
Example “Textbook” Description http: //vcell. ndsu. nodak. edu/~christjo/vcell/animation. Site/lac. Operon/ 11/25/2003 7
Our Role • Finding suitable abstractions for describing computation is our specialty! • Discovering/proving/checking properties of such descriptions (i. e. , programs) is also our specialty! • Goal: – Find a mathematical abstraction convenient for describing, reasoning, simulating biological systems • DNA ! string over the alphabet {A, C, G, T} – enables the use of string comparison algorithms • Cellular Pathways ! ? 11/25/2003 8
Outline • • • Why PL is at all related to Biology? Previous Abstractions in Biology Possible Directions of Work PML Conclusion 11/25/2003 9
Previous Abstractions • Chemical kinetic models – can derive differential equations – well-studied, with considerable theoretical basis – variables do not directly correspond with biological entities – may become difficult to see how multiple equations relate to each other 11/25/2003 10
Previous Abstractions • Pathway Databases (e. g. , Eco. Cyc, KEGG) – store information in a symbolic form and provide ways to query the database – behavior of biological entities not directly described • Petri nets – directed bipartite multigraph (P, T, E) of places, transitions, and edges; places contain tokens – place = molecular species, token = molecule, transition = reaction 2 11/25/2003 11
Previous Abstractions • Concurrent computational processes – each biological entity is a process that may carry some state and interacts with other processes – each process described by a “program” – prior proposals based on process algebras, such as the -calculus [Regev et al. ’ 01] 11/25/2003 12
Possible Directions of Work • Biologically-motivated “process calculi” – finding a suitable machine model to serve as a common basis for describing biological systems – Cardelli, Danos, Laneve, … • High-level languages – find suitable high-level languages to make descriptions closer to informal ones – [Chang and Sridharan ’ 03] • Program analyses, simulation, and other tools – simulation will likely be insufficient • Creating models for obtaining results in biology 11/25/2003 13
Outline • • • Why PL is at all related to Biology? Previous Abstractions in Biology Possible Directions of Work PML Conclusion 11/25/2003 14
Modeling in the -calculus • The -calculus is concise and compact, yet powerful [Milner ’ 90] – take this as the underlying machine model – not looking for another machine model • However, it is far too low-level for direct modeling (ad-hoc structuring) 11/25/2003 15
Informal Graphical Diagrams k-1 Protein Enzyme sites k Protein rules Protein Enzyme kcat Enzyme domains 11/25/2003 16
PML: Enzyme parameterized bind_substrate Enzyme declared in outer scope interactions within the complex 11/25/2003 17
PML: Protein 11/25/2003 bind_substrate Protein bind_product 18
PML: A Simple System 11/25/2003 19
Larger Models • Modeled a general description of ER cotranslational-translocation – unclearly or incompletely specified aspects became apparent • e. g. , can the signal sequence and translocon bind without SRP? Yes [Herskovits and Bibi ’ 00] • Extended to model targeting ER membrane with minor modifications 11/25/2003 20
PML: Summary • Domains – set of mutually dependent binding sites – defines at the lowest-level the reactions a biological entity can undergo • Groups – static structure for controlling namespace – may represent a large biological entity • large complex, a system, etc. • [Compartments] – special groups that define boundaries • Semantics defined via a translation to the calculus 11/25/2003 21
PML: Summary • Benefits – easier to write and understand because of a more direct biological metaphor – block structure for controlling namespace and modularity • Future Work – – – 11/25/2003 naming? proximity of molecules integrating quantitative information (reaction rates, etc. ) type-checking PML specifications exceptional / higher-level specifications graphical and simulation tools 22
Conclusion • Systems biology needs a mathematical foundation – languages for describing concurrent computation seem like a step in the right direction • Status: all very preliminary – biologically-motivated process calculi • Bio. SPI, Bio. Ambients, Brane Calculus, … – high-level languages • PML – analyses and tools (emerging) – creating models for results in biology (emerging) 11/25/2003 23
Conclusion • Abundance of new challenges for PL – language design: biologically-motivated operators – analysis and simulation: dealing with the scale –… • How much biology does one need to learn to begin? 11/25/2003 24
Bonus Slides
Compartments
Compartments • Critical part of biological pathways – prevents interactions that would otherwise occur • Description of the behavior of a molecule should not depend on the compartment • Regev et al. use “private” channels in the calculus for both complexing and compartmentalization 11/25/2003 28
PML: Simple Compartments Example Mol. B Mol. A bind_a 11/25/2003 bind_a 29
PML: Simple Compartments Example ER Cytosol Mol. B 11/25/2003 Cyt. ERBridge Mol. A 30
PML: Simple Compartments Example ER Cytosol Mol. B 11/25/2003 Cyt. ERBridge Mol. A 31
Semantics of PML
Semantics of PML • Defined in terms of the -calculus via two translations – from PML to Core. PML • “flattens” compartments, removes bridges 11/25/2003 33
Semantics of PML – from Core. PML to the -calculus 11/25/2003 34
Syntax of PML
Syntax of PML 11/25/2003 36
Syntax of PML 11/25/2003 37
Example: Cotranslational Translocation
Example: Cotranslational Translocation • Ribosome translates m. RNA exposing a signal sequence • Signal sequence attracts SRP stopping translation • SRP receptor (on ER membrane) attracts SRP • Signal sequence interacts with translocon, SRP disassociates resuming translation • Signal peptidase cleaves the signal sequence in the ER lumen, Hsc 70 chaperones aid in protein folding 11/25/2003 39
Example: Cotranslational Translocation 11/25/2003 40
Example: Cotranslational Translocation 11/25/2003 41
Example: Cotranslational Translocation 11/25/2003 42
Example: Cotranslational Translocation 11/25/2003 43
Example: Cotranslational Translocation 11/25/2003 44
Example: Cotranslational Translocation 11/25/2003 45
Example: Cotranslational Translocation 11/25/2003 46
- Slides: 46