Recent Trends in Computational Proteomics Dr G P

  • Slides: 51
Download presentation
Recent Trends in Computational Proteomics Dr G P S Raghava Institute of Microbial Technology,

Recent Trends in Computational Proteomics Dr G P S Raghava Institute of Microbial Technology, Chandigarh, India Bioinformatics Informatics | Drug Informatics | Chemoinformatics | Vaccine Email: raghava@imtech. res. in http: //crdd. osdd. net/ http: //www. imtech. res. in/raghava/

Studying Central Dogma with “Omis” Transcriptomics Protein Mature m. RNA Genomics DNA Metabolomics (all

Studying Central Dogma with “Omis” Transcriptomics Protein Mature m. RNA Genomics DNA Metabolomics (all the endogenous metabolites produced)

Complexity of the system

Complexity of the system

NEXT GENERATION SEQUENCING • Sequence full genome of an organism in a few days

NEXT GENERATION SEQUENCING • Sequence full genome of an organism in a few days at a very low cost. • Produce high throughput data in form of short reads. Illumina ABI’s Solid Roche’s 454 FLX Ion torrent

Genome assembly and annotation done at IMTECH • Burkholderia sp. SJ 98 (Kumar et

Genome assembly and annotation done at IMTECH • Burkholderia sp. SJ 98 (Kumar et al. 2012). • Debaryomyces hansenii MTCC 234 (Kumar et al. 2012). • Imtechella halotolerans K 1 T (Kumar et al. 2012). • Marinilabilia salmonicolor JCM 21150 T (Kumar et al. 2012). • Rhodococcus imtechensis sp. RKJ 300 (Vikram et al. 2012). • Rhodosporidium toruloides MTCC 457 (Kumar et al. 2012).

RNA sequencing RNA-Seq is a recently developed approach to transcriptome profiling that uses deepsequencing

RNA sequencing RNA-Seq is a recently developed approach to transcriptome profiling that uses deepsequencing technologies to measure levels of transcripts and their isoforms. Samples of interest Condition (colon tumor) Isolate RNAs Generate c. DNA, fragment, size select, add linkers Sequence ends Map to genome, transcriptome, and predicted exon junctions Downstream analysis 100 s of millions of paired reads 10 s of billions bases of sequence

Methods for Protein Analysis Gel electrophoresis, northern/western blot (fluorescence/radi o active label) X-ray crystallography

Methods for Protein Analysis Gel electrophoresis, northern/western blot (fluorescence/radi o active label) X-ray crystallography Protein microarrays Mass Spectrometry

Protein arrays High throughput analysis of hundreds of thousands of proteins. Proteins are immobilized

Protein arrays High throughput analysis of hundreds of thousands of proteins. Proteins are immobilized on glass chip. Various probes (protein, lipids, DNA, peptides, etc) are used. Cons Require a priori knowledge of the proteins of interest. Availability of suitable antibodies. Measure only a small fraction of the proteome

Mass Spectrometry Find a way to “charge” an atom or molecule (ionization). Place charged

Mass Spectrometry Find a way to “charge” an atom or molecule (ionization). Place charged atom or molecule in a magnetic field or electric field and measure its speed or radius of curvature relative to its mass-to-charge ratio (mass analyzer). Detect ions using microchannel plate or photomultiplier tube(Detection). Sample Ion source: makes ions Mass analyzer: separates ions Mass spectrum Presents information

Protein Identification by MS Spot removed Library from gel Theoretical spectra built Fragmented using

Protein Identification by MS Spot removed Library from gel Theoretical spectra built Fragmented using trypsin Spectrum of fragments generated MATCH Experimental spectra Artificially trypsinated Database of sequences (i. e. Swiss. Prot)

Instrumentation High Vacuum System Inlet • • • HPLC Flow injection Sample plate •

Instrumentation High Vacuum System Inlet • • • HPLC Flow injection Sample plate • • Ion Source Mass Analyzer MALDI ESI Ø Time of flight (TOF) Ø Quadrupole Ø Ion Trap Ø Magnetic Sector Ø FTMS Detector • • • Data System Microchannel Plate Electron Multiplier Hybrid with photomultiplier

Ion Sources make ions from sample molecules (Ions are easier to detect than neutral

Ion Sources make ions from sample molecules (Ions are easier to detect than neutral molecules. ) MALDI Electrospray ionization Sample plate Sample Inlet Nozzle Pressure = 1 atm Inner tube diam. = 100 um(Lower Voltage) Partial vacuum Laser MH+ N 2 Sample in solution N 2 gas ++++ ++ ++ +++ + + + +++ MH 2+ MH 3+ High voltage applied to metal sheath (~4 k. V) Charged droplets Grid (0 V) +/- 20 k. V

Mass analyzers separate ions based on their mass-to-charge ratio (m/z) Operate under high vacuum

Mass analyzers separate ions based on their mass-to-charge ratio (m/z) Operate under high vacuum (keeps ions from bumping into gas molecules) Actually measure mass-to-charge ratio of ions (m/z) Key specifications are resolution, mass measurement accuracy, and sensitivity. Several kinds exist: for bioanalysis, quadrupole, time-of-flight and ion traps are most used.

Tandem Mass Spectrometry MS LC MS-1 collision cell MS-2 Ion Source Parent Ions Fragment

Tandem Mass Spectrometry MS LC MS-1 collision cell MS-2 Ion Source Parent Ions Fragment Ions MS/MS

What’s in a Mass Spectrum? Ion Abundance (as a %of Base peak) Fragment Ions

What’s in a Mass Spectrum? Ion Abundance (as a %of Base peak) Fragment Ions Derived from molecular ion or higher weight fragments “molecular ion” In molecular ions, adduct ions, [M+reagent gas]+ High mass Mass, as m/z. Z is the charge, and for doubly charged ions (often seen in macromolecules), masses show up at half their proper value

Peptide Fragmentation H. . . -HN-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 N-terminus C-terminus AA residuei-1 AA

Peptide Fragmentation H. . . -HN-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 N-terminus C-terminus AA residuei-1 AA residuei+1 Collision Induced Dissociation H+ H. . . -HN-CH-CO Ri-1 Prefix Fragment . . . NH-CH-CO-…OH Ri Ri+1 Suffix Fragment • Peptides tend to fragment along the backbone. • Fragments can also loose neutral chemical groups like NH 3 and H 2 O.

B ions and Y ions http: //www. molgen. mpg. de/101151/Proteomics

B ions and Y ions http: //www. molgen. mpg. de/101151/Proteomics

Mass spectra searching techniques

Mass spectra searching techniques

Commercial Software SEQUEST (Yates et al. , 1995) MASCOT (Perkins, Pappin, Creasy, & Cottrell,

Commercial Software SEQUEST (Yates et al. , 1995) MASCOT (Perkins, Pappin, Creasy, & Cottrell, 1999) Open Database search tools Myrimatch X!Tandem MSGF OMSSA More accurate than Mascot and sequest (Kim & Pevzner, 2014)

Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics • incorrect “decoy” sequences added to the

Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics • incorrect “decoy” sequences added to the search space will correspond with incorrect search results that might otherwise be deemed to be correct. Mass spectrum Target and reversed Decoy database Proportion of matches in decoy database represent false matches

Applications • Analyzing Protein Modifications • Finding all modifications on a single protein •

Applications • Analyzing Protein Modifications • Finding all modifications on a single protein • Proteome wide scanning of modifications • Protein Profiling • Generate large scale proteome maps • Annotate and correct genomic sequences • Analyze protein expression as a function of cellular state • Detection of amino acid substitutions • Protein sample identification/confirmation • Protein sample purity determination

Major Proteomics Repositories

Major Proteomics Repositories

Major Challenge Large number of unidentified spectra May be peptides are missing in the

Major Challenge Large number of unidentified spectra May be peptides are missing in the database searched…. . Are all the reference databases complete ? ? ?

Proteogenomics • • Term coined in the literature in 2004. Genomic for generating customized

Proteogenomics • • Term coined in the literature in 2004. Genomic for generating customized databases. Identify novel peptides. Disease biomarkes based on novel mutation

intensity Proteomics Non-Tumor Sample intensity mass/charge Searched against Tumor Sample Refseq Uniprot Aberrant proteins

intensity Proteomics Non-Tumor Sample intensity mass/charge Searched against Tumor Sample Refseq Uniprot Aberrant proteins Variant proteins Fusion proteins mass/charge Tumor Sample Genome sequencing Germline variants mass/charge intensity Non. Tumor Sample intensity Proteogenomics RNA-Seq mass/charge Alternative splicing, somatic variants, expression Tumor Specific Protein DB

TYPES OF PEPTIDES IDENTIFIED IN PROTEOGENOMICS Novel Peptides mapping on Non Coding Region 5’UTR

TYPES OF PEPTIDES IDENTIFIED IN PROTEOGENOMICS Novel Peptides mapping on Non Coding Region 5’UTR Novel Peptides mapping on Non Coding Region 3’UTR Novel Junctions Intergenic peptides Alternating Reading Frame

Methods of generation of customized databases 6 Frame Translation of Reference Database Ab initio

Methods of generation of customized databases 6 Frame Translation of Reference Database Ab initio gene prediction. RNA-seq data Whole genome Sequencing Other Databases • Perl or python scripts • Perl and python scripts • Custom. Prodb, Galaxy-P system, sap. Finder • Peppy • OMIM, ne. Xt. Prot, Ecgene, Chimer. DB, COSMIC

What does proteogenomics offer? Proteomics Genomics Transcriptomics Novel N termini Novel signal peptide Novel

What does proteogenomics offer? Proteomics Genomics Transcriptomics Novel N termini Novel signal peptide Novel Exons Novel Junctions Variant peptides Novel ORFs Novel signal peptide cleavage sites

Open source Web Services

Open source Web Services

Chemoinformatics and Pharmacoinformatics

Chemoinformatics and Pharmacoinformatics

Molecular Interactions

Molecular Interactions

Biological Databases

Biological Databases

Genome Annotations

Genome Annotations

Immunoinformatics or Vaccine Informatics

Immunoinformatics or Vaccine Informatics

Functional Annotation of Proteins

Functional Annotation of Proteins

Proteins Structure Prediction

Proteins Structure Prediction

GPSR: A Resource for Genomics Proteomics and Systems Biology • A journey from simple

GPSR: A Resource for Genomics Proteomics and Systems Biology • A journey from simple computer programs to drug/vaccine informatics • Limitations of existing web services • History repeats (Web to Standalone) • Graphics vs command mode • General purpose programs • Small programs as building unit • Integration of methods in GPSR

Types of Prediction Methods

Types of Prediction Methods

Customized operating environment for drug discovery pipeline BIOINFORMATICS Live Server VACCINE INFORMATICS Live CD

Customized operating environment for drug discovery pipeline BIOINFORMATICS Live Server VACCINE INFORMATICS Live CD CHEMINFORMATICS Installation Pkg Repository Webserver Standalone Galaxy platform All in ONE

Osddlinux desktop is ready for use. Password for sudo : osddlinux root : osddlinux

Osddlinux desktop is ready for use. Password for sudo : osddlinux root : osddlinux Osddlinux installation on system hard drive

Operating System for Drug Discovery

Operating System for Drug Discovery