Computational Methods in Systems Biology Nir Friedman Maya
Computational Methods in Systems Biology Nir Friedman Maya Schuldiner .
What is Biology? u“A branch of knowledge that deals with living organisms and vital processes” u. The hottest scientific frontier of our times · Many great processes have been figured out · Much is still unknown u. Tremendous impact on Medicine · Both diagnosis, prognosis, and treatment 2
Bakers Yeast Saccharomyces Cereviciae 3 • Used to make bread and beer • The simplest cell that still resembles human cells
Biological Systems are Complex • The System is NOT just a sum of its parts 4
What is Systems Biology? “Systems biology is the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behavior of that system” · The last decades lead to revolution on how we can examine and understand biological systems Characterized by · High-throughput assays · Integration of multiple forms of experiments & knowledge · Mathematical modeling 5
The Age of Genomes 404 Complete Microbial Genomes (Thousands in progress) 31 Complete Eukaryotic Genomes (315 in progress!) 3 Complete Plant Genomes (6 in progress) Bacteria 1. 6 Mb 1600 genes 95 96 6 97 Eukaryote Animal Human 13 Mb 100 Mb 3 Gb ~6000 genes ~20, 000 genes ~30, 000 genes? 98 99 00 01 02 03 04 05 06 Individual Genomes? 07 08 09 10
Ask Not What Systems Biology Can do For you…. 8
Why Biology for NIPS Crowd? u Quantity · Data-intense discipline: Too vast for manual interpretation u Systematic · Collection of data on all genes/proteins/… u Multi-faceted · Measurements of complementary aspects of cellular function, development and disease states · Challenge of integration and fusion of multiple data Has the potential to be medically applicative! 9
Flow of Information in Biology 10 DNA Recipe (in safe) Working copy Protein The resulting dish Phenotype The Review
The “Post-Genomic Era” Systematic is Not Just More Assays DNA Genomic sequences u Variations within a population u 11 … u RNA Quantity u Structure u Degradation rate u… u Protein Quantity u Location u Modifications u Interactions u… u Phenotype Genetic interventions u Environmental interventions u… u
Outline DNA u. Stores RNA Protein Phenotype genetically inherited information u. Sequence of four nucleotide types (A, C, G, T) u. Two complementary strands creating base pairs (bp) u 105 bp in bacteria, 3 x 109 in humans 6 X 1013 in wheat 12
13
Understanding Genome Sequences ~3, 289, 000 characters: aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca. . . Goal: Identify components encoded in the DNA sequence 14
Open Reading Frame ATGCTCAGCGTGACCTCA. . . CAGCGTTAA M L S V u. Protein-encoding T S. . . Q R STP DNA sequence consists of a sequence of 3 letter codons u. Starts with the START codon (ATG) u. Ends with a STOP codon (TAA, TAG, or TGA) 15
Finding Open Reading Frames ATGCTCAGCGTGACCTCA. . . CAGCGTTAA M L S V T S. . . Q R STP Try all possible starting points u 3 possible offsets u 2 possible strands Simple algorithm finds all ORFs in a genome u. Many of these are spurious (are not real genes) u. How do we focus on the real ones? 16
Using Additional Genomes Basic premise “What is important is conserved” Evolution = Variation + Selection · Variation is random · Selection reflects function Idea: u Instead of studying a single genome, compare related genomes u A real open reading frame will be conserved 17
Phylogentic Tree of Yeasts S. cerevisiae ~10 M years S. paradoxus S. mikatae S. bayanus C. glabrata S. castellii K. lactis A. gossypii K. waltii D. hansenii C. albicans Y. lipolytica N. crassa M. graminearum M. grisea A. nidulans S. pombe 18 Kellis et al, Nature 2003
Evolution of Open Reading Frame S. cerevisiae S. paradoxus S. mikatae S. bayanus ATGCTCAGCGTGACCTCA ATGCTCAGCGTGACATCA ATGCTCAGGGTGACA--A ATGCTCAGG---ACA--A Conserved positions . . . . Frame shift changes interpretation of downstream seq Variable positions A deletion 19 . .
Examples Spurious ORF Conserved Variable Frame shift ATG not conserved Confirmed ORF Greedy algorithm to find conserved ORFs surprisingly Sequencing effective (> 99% accuracy) on verified yeast data error 20 [Kellis et al, Nature 2003]
Defining Conservation Naïve approach u. Consensus between all species Problem: u Rough grained u Ignores distances between species u Ignores the tree topology A A Conserved A C A G Variable A C A C A A A T C C A A G C A A T C C Goal: u. More sensitive and robust methods 21 % conserv 100 33 55 55
Probabilistic Model of Evolution Aardvark Bison Chimp Dog Elephant Random variables – sequence at current day taxa or at ancestors Potentials/Conditional distribution – represent the probability of evolutionary changes along each branch 22
Parameterization of Phylogenies Assumptions: u. Positions (columns) are independent of each other u. Each branch is a reversible continuous time discrete state Markov process governed by a rate matrix Q 23
Conserved vs. unconserved Two hypotheses: 2 3 4 1 Conserved Short branches (fewer mutations) 2 3 4 1 Unconserved Long branches (more mutations) Use 24 [Boffelli et al, Science 2003]
% conserved log Fast/Slow Genes Are Better Conserved 25 [Boffelli et al, Science 2003]
Challenges Other types of genomic elements u. Small polypeptides (peptohormones, neuropeptides) u. RNA coding genes · r. RNA, t. RNA, sno. RNA… · mi. RNA u. Regulatory regions 27
Regulatory Elements *Essential Cell Biology; p. 268 28
Transcription Factor Binding Sites u u u 29 Relatively short words (6 -20 bp) Recognition is not perfect · Binding sites allow variations Often conserved
Challenges Other types of genomic elements u. Small polypeptides (peptohormones, neuropeptides) u. RNA coding genes · r. RNA, t. RNA, sno. RNA… · mi. RNA u. Regulatory regions Recognition of elements without comparisons u. Clearly sequence contains enough information to “parse” it within the living cell 30
Outline DNA RNA Protein Phenotype Copied from DNA template u Conveys information (m. RNA) u Can also perform function (t. RNA, r. RNA, …) u Single stranded, four nucleotide types (A, C, G, U) u For each expressed gene there can be as few as 1 molecule and up to 10, 000 molecules per cell. u 31
Gene Expression u. Same DNA content u. Very different phenotype u. Difference is in regulation of expression of genes 33
High Throughput Gene Expression Transcription Translation Extract Microarray 34 RNA expression levels of 10, 000 s of genes in one experiment
Dynamic Measurements Genes Conditions 35 Gasch et al. Mol. Cell 2001 u. Time courses u. Different perturbations (genetic & environmental) u. Biopsies from different patient populations u…
Expression: Supervised Approaches Labeled samples u. Potential Classifier confidence Feature selection + Classification diagnosis/prognosis tool u. Characterizes the disease state insights about underlying processes 36 P-value =< 0. 027 Segman et al, Mol. Psych. 2005
Expression: Unsupervised PCA 37 Cluster Eisen et al. PNAS 1998; Alter et al, PNAS 2000
Papers Compendia 26 datasets from Whitehead and Stanford Various tumors Stimulated PBMC Viral infection B lymphoma Breast cancer Stimulated immune Fibroblast EWS/FLI Prostate cancer Fibroblast infection Neuro tumors Fibroblast serum NCI 60 Gliomas He. La cell cycle Lung cancer 39 Leukemia Liver cancer Segal et al Nat. Gen. 2004
Apoptosis DNA damage / nucleotide metabolism Apoptosis Immune MMPs Immune Cancer types Signaling & growth regulation Signaling Immune Muscle Immune Cytoskeleton & ECM Adhesion & signaling Synapse & signaling Metabolism Chromatin Breast Modules Cytoskeleton (IF & MT) Cancer span wide range of phenomena • Tumor type specific • Tissue specific • Generic across many tumors Signaling & development Protein biosynthesis Translation, degradation & folding Nucleotide metabolism Signaling, development & oxidative phos. Cell lines Cell cycle ECM IF & keratins Metabolism, detox & immune Signaling & growth regulation Metabolism, detox & immune Signaling & CNS Tissues Immune Liver Immune Lung / Hemato AD/CNS Hemao Cell lines Hemato Immune Breastliver Liver Hemato Lung/AD Hemato Hemati 40 >0. 4 Hemato 0 Leukemia >0. 4 Segal et al Nat. Gen. 2004
Goal: Reconstruct Cellular Networks Biocarta. http: //www. biocarta. com/ 41
First Attempt: Bayesian networks Gene A Gene B Gene C Gene D Gene E One gene One variable u. An instance: microarray sample Use standard approaches for learning networks 42 Friedman et al, JCB 2000
Second Attempt: Module Networks MAPK of cell wall integrity pathway SLT 2 RLM 1 Regulation Function 1 CRH 1 YPS 3 PTP 2 Regulation Function 3 Regulation Function 4 One common regulation function Idea: enforce common regulatory program u. Statistical robustness: Regulation programs are estimated from m*k samples u. Organization of genes into regulatory modules: Concise biological description 43 Segal et al, Nature Genetics 2003
Learned Network (fragment) u Gasch et al. 2001: Yeast Response to Environmental Stress Module 2 u 173 Yeast arrays (64 genes) u 2355 Genes u 50 modules Module 25 (59 genes) Tpk 2 Tpk 1 Kin 82 Msn 4 Usv 1 Nrg 1 Module 4 (42 genes) Hap 4 Module 1 (55 genes) Atp 1 44 Atp 3 Atp 16 Mth 1
Validation How do we evaluate ourselves? u. Statistical validation · Ability to generalize (cross validation test) Test Data Log-Likelihood (gain per instance) 150 100 50 0 -50 -100 -150 45 Bayesian network performance 0 100 200 300 400 Number of modules 500
Validation How do we evaluate ourselves? u. Statistical validation u. Biological interpretation · Annotation database · Literature reports · Other experiments, potentially different experiment types 46
Visualization & Interpretation GO Cis-regulatory motifs Expression profiles 47 Functional annotations Molecular Pathways (KEGG Gene. MAPP) Visualization Interpretation Hypotheses u Function u Dynamics u Regulation
Msn 4 Oxid. Phosphorylation (26, 5 x 10 -35) Mitochondrion (31, 7 x 10 -32) Aerobic Respiration (12, 2 x 10 -13) Hap 4 HAP 4 Motif 29/55; p<2 x 10 -13 STRE (Msn 2/4) 32/55; p<103 HAP 4+STRE 17/29; p<7 x 10 -10 -500 -400 -300 -200 Gene set coherence (GO, MIPS, KEGG) Match between regulator and targets Match between regulator and cis-reg motif Match between regulator and condition/logic 48 p-values using hypergeometric dist; corrected for multiple hypotheses -100
Validation How do we evaluate ourselves? u. Statistical validation u. Biological interpretation u. Experiments · Test causal predictions in the real system · Lead to additional understanding beyond the prediction · Experimental validation of three regulators ¨ 3/3 49 successful results Segal et al, Nature Genetics 2003
Challenges u. New methodologies for the huge amount of existing RNA profiles · Meta analysis · Better mechanistic models · Contrasting new profiles with existing databases · Visualization u. Other 50 measurements · Degradation rates · Localization
Outline DNA u. Proteins RNA Protein Phenotype are the main executers of cellular function u. Building blocks are 20 different amino-acid u. Synthesized from m. RNA template u. Acquires a sequence dependent 3 -D conformation u. Proteomics: Systematic Study of Proteins 51
Why Measure Proteins? u. RNA Level ≠ Protein level Protein quantity is not a direct function of RNA levels u. Protein Level ≠ Activity level Activity of proteins is regulated by many additional mechanisms · Cellular localization · Post-translational modifications · Co-factors (protein, RNA, …) 52
Challenges in Proteomics u. Problematic recognition: No generic mechanism to detect different protein forms u. Thousands u. Protein of different proteins in the typical cell abundances vary over several orders of magnitude 53
Making a Protein Generic TAG • • • 54 Tags make a protein generic Underlying assumption is that the tag does not change the protein All proteins have the same tag 1. Inability to pool strains 2. Each experiment is done on a “different” strain
TAP-Tag Libraries for Abundance ~4500 Yeast strains have been TAP tagged • How much is each protein expressed? • What is the proteome under different conditions? 55
Why Study Protein Complexes? u# Most proteins in the cell work in protein complexes or through protein/protein interactions u# To understand how proteins function we must know: u - who they interact with - when do they interact - where do they interact - what is the outcome of that interaction 56
Using TAP-Tag to Find Complexes .
Large Scale Pull Downs Provide Information on Protein Complexes • Both labs used the same proteins as bait • Each lab got slightly different results • The results depended dramatically on analysis method *Gavin et al. Nature 2006. *Krogan et al. Nature 2006
*Gavin et al. Nature 2006 *Krogan et al. Nature 2006 59
We can now define a yeast “interactome” • Isnt full use of data • Static picture .
Making a Protein Generic 1. Fluorescent proteins allow us to visualize the proteins within the cell. 2. Allow us to measure individual cells and the variation/ noise within a population 61
Cellular Localization Using GFP Tags What can it teach us? A library of yeast GFP fusion strains has been used to localize nearly all yeast proteins 62 Huh et al Nature 2003 A collection of cloned C. elegans promoters is being created for similar purposes Genome Research 14: 2169 -2175, 2004
Challenges in Fluorescence-based Approaches u. Better Vision processing will allow to do this in High-Throughput and answer questions like: · Changes in localization in response to cellular cues · Changes in localization in response to environment cues · Changes in localization in various genetic backgrounds · Dynamics of localization changes 63
THROUGHPUT THE MAJOR BOTTLENECK .
Single Cell Measurements: Flow Cytometry u Cells pass through a flow cell one at a time u Lasers focused on the flow cell excite fluorescent protein fusions u Allows multiple measurements (cell size, shape, DNA content) Applications: u Protein abundance u Protein-protein interactions u Single-cell measurements 65
High Throughput Flow Cytometer u 7 seconds/sample u~50, 000 counts per sample 66
Comparison of m. RNA to Protein Levels Allows Identification of Post-transcriptional Regulation u Rich media u Poor media Observed behaviors u No change in both u Coordinated change u Change in protein, but not m. RNA Log 2 Poor/Rich protein Compare Log 2 Poor/Rich m. RNA 67 Newman et al Nature 2006
Noise in Biological Systems u Measurement of 10, 000 individual cells allows measurement of variation (noise) in a biological context u factors that affect levels of noise in gene expression: · Abundance, mode of transcriptional regulation, subcellular localization 68 Nature 441, 840 -846(15 June 2006)
Challenges Proteomics is in its infancy - easier to make an impact u u u 69 Integrating this data with other proteomic/genomic data to better predict protein function Higher Throughput methods such as flow cytometry will allow generation of varied data: Different growth conditions, Cell cycle, Stress, Mating Tagging is mammalian cells becoming more feasable near future should bring proteomic data on human cells
Outline DNA u Traits RNA Protein Phenotype that selection can apply to, the observable characteristics u Mutations in the DNA can cause a change in a phenotype. · Shape and size · Growth rate · How many years your liver can survive alcohol damage…. 70
Single Gene KO u. Phenotypic Screen Giaver et. , 2002 71
Starting to Probe the Cellular Network Genetic Interaction • The effect of a mutation in one gene on the phenotype of a mutation in a second gene • Different type of interaction - not physical 72
What is a genetic interaction (Epistasis)? The effect of a mutation in one gene on the phenotype of a mutation in a second gene. Genotype WT D gene. A D gene. B Growth Rate 1 x (x <= 1) y (y <= 1) xy (Product) DIFFERENT TYPE OF INTERACTION - NOT PHYSICAL 74
What is a genetic interaction? Genetic Interaction None Aggravating Alleviating A X B Y 75 C Growth Rate xy less than xy greater than xy A B X Y C
Systematic Method of Analyzing Double Mutants X ∆X: NAT ∆Y: KAN WT ∆X: NAT ∆Y: KAN Double deletion mutants are made systematically u Colony sizes are measured in high throughput u 76 Tong et al. , 2001
E-MAPS Epistasis Mini Array Profiles 77 Aggravating Alleviating Schuldiner et al. , Cell 2005
Defining Protein Complexes On B A C Co-complex proteins have • similar interaction patterns • alleviating interactions Off Aggravating 78 Alleviating
Challenges for the future u. Only a small fraction of the information has been utilized in E-MAPS made so far u. E-MAPS to cover all yeast cellular processes to come out until the end of 2007 u. Extending this to human cells is now feasible using gene silencing techniques u. Amount of data scales exponentially - Higher organisms - more genes .
Outline DNA RNA Protein Combined Insights u. Model-free approach u. Model-based approach 80 Phenotype
Why Integrate Data? attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat High-throughput assays: • Observations about one aspect of the system • Often noisy and less reliable than traditional assays • Provide partial account of the system 81
Model-Free Approach Location Gene Nuc Cyto Expression Phenotype Mito Rich Poor Salt Kan Binding sites RAP 1 HSF 1 YAL 001 C YAL 002 W YAL 003 W YAR 040 W YAR 041 C u. Treat different observations about elements as multivariate data · Clustering · Statistical tests 82 GCN 4
Model-Free Approach Finding bi-clusters in large compendium of functional data 83 Tanay et al PNAS 2004
Model-Free Approach Pros: u. No assumptions about data · Unbiased · Can be applied to many data types u. Can use existing tools to analyze combined data Cons: u. No assumptions about data · Interpretation is post-analysis · No sanity check u. Cannot deal with data from different modalities (interactions, other types of genetic elements) 84
Model-Based Approach What is a model? “A description of a process that could have generated the observed data” attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat § Idealized, simplified, cartoonish § Describes the system & how it generates observations 85
Explaining Expression DNA binding proteins Non-coding region Gene Activator Repressor Coding region Binding sites RNA transcript Key Question: u. Can we explain changes in expression? General concept: u. Transcription factor binding sites in promoter region should “explain” changes in transcription 86
Explaining Expression Relevant data: u. Expression under environmental perturbations u. Expression under transcription factors KOs u. Predicted binding sites of transcription factors u. Protein-DNA interactions of transcription factors u. Protein levels/location of transcription factors 87 u…
A Stab at Model-Based Analysis Motifs TCGACTGC Motif Profiles TCGACTGC + GATAC Expression Profiles 88 GATAC CCAAT GCAGTT + CCAAT Genes Sequence CCAAT ACGATGCTAGTGTAGCTGATCGATCGTACGTGCTAGCTAGCTAGCTAGC GATAC CCAAT TCGACTGC AGCTCGACTGCTTTGTGGGGCCTTGTGTGCTCAAACACAACACCAAATGTGCTTTGTGGT CCAAT ACTGATGATCGTAGTAACCACTGTCGATGATGCTGTGGGGGGTATCGATGCATACCACCCCCCGCTC GATAC TCGACTGC GATCGTAGCTAGCTGATCAAAAACACCATACGCCCCCCGTCGCTGCTCGTAGCATG CCAAT TCGACTGC GATAC CTAGCTGATCAGCTACGATCGACTGATCGTAGCTACTTTTTTGCTAGCAC GCAGTT CCAACTGATCGTAGTCAGTACGATCGTGACTGATCGCTCGTCGTCGATGCATCGTACGT TCGACTGC GCAGTT CCAAT AGCTACGTAGCATGCTAGCTGCTCGCAAAAACGTCGTCGATCGTAGCTGCTCGCCCCC GATAC CCAAT TCGACTGC GCAGTT CCGACTGATCGTAGCTGATCGATCGTAGCTGAATTATATATACGGCG
Unified Probabilistic Model Sequence S 1 S 2 S 3 S 4 Motifs R 1 R 2 R 3 Motif Profiles Expression Profiles 89 Experiment Gene Expression Segal et al, RECOMB 2002, ISMB 2003
Unified Probabilistic Model Sequence S 1 S 2 S 3 S 4 Motifs R 1 R 2 R 3 Motif Profiles Expression Profiles 90 Module Experiment Gene Expression Segal et al, RECOMB 2002, ISMB 2003
Unified Probabilistic Model Sequence S 1 S 2 S 3 Observed S 4 Motifs R 1 R 2 R 3 Motif Profiles Expression Profiles 91 Experiment Module ID Gene Level Expression Observed Segal et al, RECOMB 2002, ISMB 2003
Probabilistic Model Regulatory Modules Sequence S 1 S 2 S 3 S 4 genes Sequence Motif profile Motifs Expression profile R 1 R 2 R 3 Motif Profiles Expression Profiles 92 Experiment Module ID Gene Level Expression Segal et al, RECOMB 2002, ISMB 2003
Model-Based Approach Pros: u. Incorporates biological principles · Suggests mechanisms · Incorporate diverse data modalities u. Declarative semantics -- easy to extend Cons: u. Reconstruction depends on the model u. Biological principles · Bias 93
Physical Interactions 94
Physical Interactions Interaction between two proteins makes it more probable that they · share a function · reside in the same cellular localization · their expression is coordinated · have similar genetic interactions ·… Can we exploit this to make better inference of properties of proteins? 95
Relational Markov Network u. Probabilistic patterns hold for all groups of objects u. Represent local probabilistic dependencies Protein Nucleus Cytoplasm P 1. N 0 0 1 1 P 2. M 0 1 0 0 0 -1 Mitochndri a P 1. N 0 0 1 1 96 P 2. N 0 0 1 1 I. E 0 1 0 1 0 0 0 -1 0 2 Interaction Exists Protein Nucleus Mitochndri a Cytoplasm
Relational Markov Network u. Compact u. Allows model to infer protein attributes by combining · Interaction network topology (observed) · Observations about neighboring proteins 97
Adding Noisy Observations u. Add class for experimental assay u. View assay result as stochastic function (CPD) of underlying biology GFP image Protein Cytoplasm Nucleus Cytoplasm Mitochndria Mitochndri a Interaction Exists Directed CPD Protein Nucleus Mitochndri a 98 Cytoplasm
Uncertainty About Interactions u. Add interaction assays as noisy sensors for interactions GFP image Protein Cytoplasm Nucleus Cytoplasm Mitochndria Mitochndri a Interaction Exists Protein Nucleus Mitochndri a 99 Cytoplasm Assay Interact
Design Plan Simultaneous prediction Taf 1 Relational Markov Network Tbf 1 Med 17 Cln 5 Srb 1 Mcm 1 Pre 7 Pre 9 Pup 3 Pre 5 Med 5 100 Cdk 8 Med 1 Taf 10
Relational Markov Network u. Add potentials over interactions Protein Nucleus Protein Interaction Nucleus Exists Interaction Exists Protein Nucleus 101 Potential over
Relational Markov Models Combine u(Noisy) interaction assays u(Noisy) protein attribute assays u. Preferences over network structures To find a coherent prediction of the interaction network 102
Discussion u. Every day papers are published with highthroughput data that is not analyzed completely or not used in all ways possible u. The bottlenecks right now are the time and ideas to analyze the data 104
The Need for Computational Methods Experiment Modeling & Simulation Low-level analysis High-level analysis 106
What are the Options? u. Analyze published data · Abundant, easy to obtain · Method oriented · Don’t have to bump into biologists · Two million other groups have that data too u. Collaborate with an experimental group · Be involved in all stages of project · Understand the system and the data better · Have priority on the data · Involved in generating & testing biological hypotheses · Goal oriented u. Start your own experimental group…(yeah, 107 sure)
Questions to Keep in Mind Crucial questions to ask about biological problems u. What quantities are measured? Which aspects of the biological systems are probed u. How are they measured? How this measurement represents the underlying system? Bias and noise characteristics of the data u. Why are these measurements interesting? u. Which conclusions will make the biggest impact? 108
Acknowledgements Slides: The Computational Bunch ·Yoseph Barash ·Ariel Jaimovich ·Tommy Kaplan ·Daphne Koller ·Noa Novershtern ·Dana Pe’er ·Itsik Pe’er ·Aviv Regev ·Eran Segal The Biologist Crowd ·David Breslow ·Sean Collins ·Jan Ihmels ·Nevan Krogan ·Jonathan Weissman Special thanks: Gal Elidan, Ariel Jaimovich 109
- Slides: 103