Peptide Mass Fingerprinting Manimalha Balasubramani Genomics and Proteomics
Peptide Mass Fingerprinting Manimalha Balasubramani Genomics and Proteomics Core Laboratories
Genomics and Proteomics Core Lab website www. genetics. pitt. edu
GPCL Inventory ØABI Voyager DE PRO, user operated ØABI 4700 Proteomics Analyzer ØThermoelectron LCQ Deca with Surveyor HPLC ØABI Qstar Elite with Ultimate 3000 HPLC ØBruker micr. OTOF with Ultimate 3000 HPLC ØBruker 12 Tesla FTMS with Ultimate 3000 HPLC
4700 Proteomics Analyzer, ABI Voyager DE PRO, ABI micr. OTOF, Bruker
LCQ Deca XP, Thermofisher Qstar Elite, ABI 12 T FT MS, Bruker
Peptide mass fingerprinting (PMF) is a technique for protein and peptide identification
Outline • PMF Workflow: – Sample preparation – Mass spectra: MS, and MS/MS – Database searches • Examples, hands-on exercises • Contaminants, post-translational modifications, enzyme digestions • Evaluating PMF analysis
PMF: Sample preparation Peptide fingerprint
Mass Spectra are acquired with. . MALDI TOF MS (Voyager DE PRO, ABI) MALDI TOF/TOF MS (4700 Proteomics Analyzer, ABI) MALDI – Matrix Assisted Laser Desorption Ionization TOF – Time Of Flight MS – Mass Spectrometry
Intensity Mass Spectrum: MS Mass to charge ratio (m/z)
FWHM Full width at half maxima of a peak Source: wiki
Resolution and mass accuracy Δm measured at 50% peak height is the Full Width at Half Maxima (FWHM) R= M Δm R = resolution M = mass of the peak of interest Δ m = width in daltons of the peak
Ubiquitin ESI Spectra on 12 T FT-ICR Mass Error > 0. 56 ppm
Ubiquitin ESI Spectra on 12 T FT-ICR Mass Error < 0. 56 ppm
Ubiquitin ESI Spectra 12 T FT-ICR Resolution > 175, 000
Mass accuracy is measured as parts per million value ppm = 106Δm = 106 M R
Peptide Mass Fingerprint
Mass spectrum processing, calibration • External calibration • Internal calibration – trypsin autodigestion peaks – Keratin peaks – Spiking with an internal standard
Peak List • Spectrum viewer • Compiled from the mass spectra – Mass list and intensity • Peak list is submitted for Database searching
Database searching
Description of database searching using Mascot program - At GPCL, 4800 Proteomics analyzer data is presented to the Mascot webserver through Protein. Pilot - Mascot can be accessed through the web - http: //www. matrixscience. com
Mascot scoring A frequency factor matrix, F, is created, in which each row represents an interval of 100 Da in peptide mass, and each column an interval of 10 k. Da in intact protein mass. As each sequence entry is processed, the appropriate matrix elements fi, j are incremented so as to accumulate statistics on the size distribution of peptide masses as a function of protein mass. The elements of F are then normalised by dividing the elements of each 10 k. Da column by the largest value in that column to give the Mowse factor matrix M: After searching the experimental mass values against a calculated peptide mass database, the score for each entry is calculated according to: Where MProt is the molecular weight of the entry and the product term is calculated from the Mowse factor elements for each match between the experimental data and peptide masses calculated from the entry. Source: http: //www. matrixscience. com/
PMF search page
Parameters used in database searching • • • Database searched Taxonomy Enzyme Missed cleavages Fixed versus variable modifications (PTMs) • MW and p. I • Mass tolerance
Oxidation of methionine in proteins and peptides +16 Da +32 Da From Ionsource. com
S-carboxymethylation of the amino acid residue cysteine with the alkylating agent iodoacetic acid Or s-carbamidomethylation with iodoacetamide (+57 da) + 58 Da From Ionsource. com
Databases: NCBI nr. *tar. gz non-redundant protein sequence database with entries from Gen. Pept, Swissprot, PIR, PDF, PDB, and NCBI Ref. Seq
Swiss-Prot, IPI, others
Submit a peak list to Mascot 1075. 513062 1086. 581177 1090. 547241 1092. 517822 1100. 630249 1103. 572754 1106. 553223 1107. 529663 1118. 498779 1119. 519531 1121. 509644 1129. 604492 1141. 572388 1156. 586792 1166. 537231 1170. 607422 1172. 612183 1179. 590332 1194. 604126 1217. 567749 1232. 610474 1252. 583740 1308. 654297 1312. 705811 1314. 744385 1337. 672485 1401. 651245 1424. 745728 1427. 830566 1435. 718872 1475. 762695 1479. 710327 1493. 734131 1502. 774780 1530. 834717 1575. 850952 1607. 807007 1629. 868408 1639. 935425 1752. 863892 1753. 904663 1754. 915161 1791. 744507 1792. 805054 1794. 820801 1816. 801392 1875. 976196 1902. 006104 1940. 941650 1960. 053345 1962. 928955 2211. 118652 2225. 130371 2233. 105225 2249. 076660 http: //matrixscience. com/cgi/search_form. pl? FORMVER=2&SEARCH=PMF
Mascot PMF report
Hands-on exercise • Go to Desktop – open txt file • copy and paste in Mascot search page – Specify search parameters » Allow 100 ppm error for PMFal_100. txt » Allow 25 ppm error for PMFgd_25. txt
Not all peaks are matched –why? • Theoretical peptide list – peptides lengths vs. MS range – Enzyme – missed/non-specific cleavage – Incorrect ORF – Amino acid substitutions – Ion suppression/efficiency
Not all peaks are matched –why? • Experimental peptide list – Contaminants • • Trypsin autolysis peptides Hair, skin keratins Matrix molecules, clusters Unknown contaminants – Modifications • PTM’s – known and unknown, biological origin • Oxidized methionines, – gel induced artifacts • Chemical – cysteine carbamidomethylation, sample handling introduced • Adducts • Amino acid substitutions • Splice variant
Database search takes into account contaminants, modifications, For eg.
Evaluating PMF analysis • Acceptable hit – High score – Major peaks accounted for • No hit – Insufficient data – low intensity MS – Single gel band contains >2 -3 proteins – Protein not represented in database – ORF/genome • Further analysis – MS/MS confirmation of few major peaks, unaccounted peaks – Ideal – Low score, good spectrum – LC MS/MS – Low score, low intensity spectrum – concentrate sample, reacquire – High score, some unaccounted peaks – MS/MS
MS/MS • Plot of m/z versus intensity • At GPCL, – MALDI TOF/TOF MS – ESI Qq. TOF MS – ESI IT MS – MALDI/ESI FT ICR MS
Tandem MS 4700 Proteomics Analyzer, Applied Biosystems
MS MS, followed by precursor ion selection
Fragment ion spectrum Tandem MS
Tandem mass spectrum http: //qbab. aber. ac. uk
Tandem mass spectra (MS/MS) can be used for peptide sequencing Database Searching • Peptide Mass Fingerprinting • Sequence tag approach De novo sequencing inspect raw data http: //qbab. aber. ac. uk
Mascot Search Results Search title : Sample. Set. ID: 362, Analysis. ID: 567, Maldi. Well. ID: 15790, Spectrum. ID: 17225, Path=Mani102004New Analysis 1 Database : NCBInr 20040606 (1846720 sequences; 611532004 residues) Timestamp : 20 Oct 2004 at 14: 52: 50 GMT Top Score : 681 for gi|180570, creatine kinase [Homo sapiens] Probability Based Mowse Score is -10*Log(P), where P is the probability that the observed match is a random event. Protein scores greater than 75 are significant (p<0. 05).
Top hits from Mascot Search – there are multiple accession numbers for the same protein
Search returns a cluster of proteins with the same matching peptides
Creatine kinase B is the highest scoring protein Match to: gi|21536286 ; Score: 681 Creatine kinase - B [Homo sapiens] Nominal mass (Mr): 42591; Calculated p. I value: 5. 34 Observed Mass & p. I: 43 kd, 6. 2 -6. 27 Sequence Coverage: 46% 1 MPFSNSHNAL KLRFPAEDEF PDLSAHNNHM AKVLTPELYA ELRAKSTPSG 51 FTLDDVIQTG VDNPGHPYIM TVGCVAGDEE SYEVFKDLFD PIIEDRHGGY 101 KPSDEHKTDL NPDNLQGGDD LDPNYVLSSR VRTGRSIRGF CLPPHCSRGE 151 RRAIEKLAVE ALSSLDGDLA GRYYALKSMT EAEQQQLIDD HFLFDKPVSP 201 LLSASGMARD WPDARGIWHN DNKTFLVWVN EEDHLRVISM QKGGNMKEVF 251 TRFCTGLTQI ETLFKSKDYE FMWNPHLGYI LTCPSNLGTG LRAGVHIKLP 301 NLGKHEKFSE VLKRLRLQKR GTGGVDTAAV GGVFDVSNAD RLGFSEVELV 351 QMVVDGVKLL IEMEQRLEQG QAIDDLMPAQ K
GPCL resources for Bioinformatic analysis • Mascot version 2. 1. 0, Matrix Science Ltd – Mascot Daemon • Protein. Pilot software 2. 0, Applied Biosystems/MDS Sciex – Paragon algorithm – And Mascot algorithm • Sequest, Thermoelectron Selected list
Resources http: //www. hsls. pitt. edu/guides/genetics/obrc /proteomics
. . its high-throughput… 1 st Dimension - Isoelectric focussing 2 nd Dimension – SDS PAGE Spot picking Trypsin gel digest
Sample separation. . In-solution Isoelectric focussing HPLC 1 D or 2 D LC MALDI
GPCL services. . • Fee for service model • Support investigators – Scientific expertise – Technical expertise – Grant submission
Genomics and Proteomics Core Laboratories Paul Wood Director Janette Lamb Assistant Director Proteomics Lab Chris Bolcato John Cardamone Emanuel M Schreiber Guy Ueichi James Porter Robert Wolfe Jason Sun Billy W. Day Scientific Director
A mass spectrum • • • Plot of m/z versus intensity MALDI TOF (/TOF) MS ESI TOF MS ESI Qq. TOF MS ESI IT MS MALDI/ESI FT ICR MS
Mass analyzers – several designs Aebersold and Mann, Nature review, 422, p 198, 2003
Qq. TOF MS/MS
Each search engine scores differently SEQUEST Each search engine identifies about the same number of spectra, But the overlap is surprisingly small. 9% 22% 4% Different search engines match different spectra. 34% X!tandem 19% 7% 5% Courtesy: Proteome Software Inc. Mascot
James Lyons-Weiler Scientific Director Bioinformatics Analysis Core (412) 393 -2087 (office) (412) 728 -8743 (cell) Fax: 412 -648 -1891
- Slides: 56