Protein Sequence Analysis Overview NIH Proteomics Workshop 2006
- Slides: 34
Protein Sequence Analysis - Overview NIH Proteomics Workshop 2006 Darren Natale Team Lead – Protein Science, PIR Research Assistant Professor, Georgetown University Medical Center
Major Topics l l l Proteomics and protein bioinformatics (protein sequence analysis) Why do protein sequence analysis? Searching sequence databases Post-processing search results Detecting remote homologs
Clinical Proteomics From Petricoin et al. , Nature Reviews Drug Discovery (2002) 1, 683 -695
Single protein and shotgun analysis Mixture of proteins Gel based seperation Single protein analysis Shotgun analysis Digestion of protein mixture Spot excision and digestion Peptides from many proteins Peptides from a single protein LC or LC/LC separation MS analysis MS/MS analysis Protein Bioinformatics Adapted from: Mc. Donald et al. (2002). Disease Markers 18: 99 -105
Protein Bioinformatics: Protein Sequence Analysis l Helps characterize protein sequences in silico and allows prediction of protein structure and function l Statistically significant BLAST hits usually signifies sequence homology l Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold l Protein sequence analysis allows protein classification
Development of protein sequence databases l Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein Information Resource (PIR) l Protein data bank (PDB) – structural database (1972) remains most widely used database of structures l Uni. Prot – The Universal Protein Resource (2003) is a central database of protein sequence and function created by joining the forces of the Swiss-Prot, Tr. EMBL and PIR protein database activities
Comparative protein sequence analysis and evolution l Patterns of conservation in sequences allows us to determine which residues are under selective constraint (and thus likely important for protein function) l Comparative analysis of proteins is more sensitive than comparing DNA l Homologous proteins have a common ancestor l Different proteins evolve at different rates l Protein classification systems based on evolution: PIRSF and COG
PIRSF and large-scale annotation of proteins l PIRSF is a protein classification system based on the evolutionary relationships of whole proteins l As part of the Uni. Prot project, PIR has developed this classification strategy to assist in the propagation and standardization of protein annotation
Comparing proteins l Amino acid sequence of protein generated from proteomics experiment e. g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFT l Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) as a measure of relatedness. l Protein structures can be compared by superimposition
Protein sequence alignment l Pairwise alignment abacd ab_cd l Multiple sequence alignment provides more information abacd ab_cd xbace l MSA difficult to do for distantly related proteins
Protein sequence analysis overview l Protein databases l l Searching databases l l PIR and Uni. Prot Peptide search, BLAST search, Text search Information retrieval and analysis l l Protein records at Uni. Prot and PIR Multiple sequence alignment Secondary structure prediction Homology modeling
Universal Protein Resource http: //www. uniprot. org/ Uni. Ref 50 Clustering at 100, 90, 50% Uni. Ref 90 Uni. Prot Uni. Ref 100 NREF Automated Annotation Automated merging of sequences Swiss. Prot Literature-Based Annotation Uni. Prot. KB Knowledgebase Uni. Prot Uni. Parc Archive Tr. EMBL PIR-PSD Ref. Seq Gen. Bank/ Ens. EMBL/DDBJ PDB Patent Data Other Data
Peptide Search
ID mapping
Query Sequence l Unknown sequence is Q 9 I 7 I 7 l BLAST Q 9 I 7 I 7 against the Uni. Prot Knowledgebase (http: //www. uniprot. org/search/blast. shtml) l Analyze results
BLAST results
Text Search
Text search results: display options Moving Pubmed ID and PDB ID into “Columns in Display”
Text search results: add input box
Text Search Result with NULL/NOT NULL
Uni. Prot. KB Protein Record
SIR 2_HUMAN Protein Record
Are Q 9 I 7 I 7 and SIR 2_HUMAN homologs? l Check BLAST results l Check pairwise alignment
Protein structure prediction l Programs can predict secondary structure information with 70% accuracy l Homology modeling - prediction of ‘target’ structure from closely related ‘template’ structure
Secondary structure prediction http: //bioinf. cs. ucl. ac. uk/psipred/
Secondary structure prediction results
Sir 2 structure
Homology modeling http: //www. expasy. org/swissmod/SWISS-MODEL. html
Homology model of Q 9 I 7 I 7 Blue - excellent Green - so so Red - not good Yellow - beta sheet Red - alpha helix Grey - loop
Sequence features: SIR 2_HUMAN
Multiple sequence alignment
Multiple sequence alignment Q 9 I 7 I 7, Q 82 QG 9, SIR 2_HUMAN
Sequence features: CRAA_RABIT
Identifying Remote Homologs
- Comparative proteomics kit ii western blot module
- History of proteomics
- Sac seismic
- Mad cow disease
- Comparative proteomics kit ii western blot module
- Comparative proteomics kit ii western blot module
- Carmelego
- Channel vs carrier proteins
- Protein-protein docking
- Amino acid nucleotide
- Example of sequence in pseudocode
- Differentiate finite sequence and infinite sequence
- Convolutional sequence to sequence learning
- Chapter 1 overview of financial statement analysis
- Lowry method
- Protein purity analysis
- Administrative supplement nih
- Theresa cruz california
- Nih transhare
- Nih staff scientist
- Nih other significant contributor
- Jonathan pollock nih
- Building 37 nih
- Nih institutes list
- Dawn corbett nih
- Http://ghr.nlm.nih.gov/
- Http://ghr.nlm.nih.gov/
- Chris baker nih
- Alison lin nih
- Differential diagnosis of stroke
- Nih scale
- Qvr nih
- Nih cit org chart
- Web single signon
- Nih background