Duncan Legge EMBLEBI Introduction to Protein Signatures Inter
Duncan Legge EMBL-EBI
Introduction to Protein Signatures & Inter. Pro Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Foundations of Inter. Pro Integration of signatures Manual curation Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro Inter. Pro
Inter. Pro Consortium of 11 major signature databases Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
What value are signatures? • Better at finding proteins with common function Ø Find more distant homologues than BLAST Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
What value are signatures? • Better at finding proteins with common function • Classification of proteins Ø Associate proteins that share: Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro Function Domains Sequence Structure
What value are signatures? • Better at finding proteins with common function • Classification of proteins • Annotation of protein sequences Ø Define conserved regions of a protein - e. g. location and type of domains key structural or functional sites Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Protein Signature Methods Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
How are protein signatures made? Protein family/domain Build model Multiple sequence alignment Search Refine Significant matches Protein signature Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro ITWKGPVCGLDGKTYRNECALL E-value 1 e-49 AVPRSPVCGSDDVTYANECELK E-value 3 e-42 SVPRSPVCGSDGVTYGTECDLK E-value 5 e-39 HPPPGPVCGTDGLTYDNRCELR E-value 6 e-10
Types of Protein signatures (sequence based) Multiple protein alignment Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns C - {P} - x(2) - C - [STDNEKPI] - C Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns Must be this x = any AA ( ) = number of AAs C - {P} - x(2) - C - [STDNEKPI] - C { } = cannot be. . Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro [ ] = any of
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns 1 2 3 Multiple motif methods Identity matrices Fingerprints Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns Full domain alignment methods Profiles (Profile Library) Multiple motif methods Identity matrices Fingerprints I 2 I 1 M 1 Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro I 3 M 2 M 3 D 2 D 3 M 4 Hidden Markov Models Mathematical model of amino acid probability
CONTRIBUTING MEMBER DATA BASES Models built on either sequence or structural alignments Each MDB has its own focus Hidden Markov Models Finger. Prints Structural Domains Profiles Protein features (active sites…) Functional annotation of families/domains 16 Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro Patterns Sequence Clusters Prediction of conserved domains
Database Basis Institution Built from Focus URL Family & Domain based on conserved sequence http: //pfam. sanger. ac. u k/ Pfam HMM Sanger Institute Sequence alignment Gene 3 D HMM UCL Structure alignment Structural Domain http: //gene 3 d. biochem. ucl. ac. uk/Gene 3 D/ Evolutionary domain relationships http: //supfam. cs. bris. ac. uk/SUPERFAMILY/ Superfamily HMM Uni. of Bristol Structure alignment SMART HMM EMBL Heidelberg Sequence alignment Functional domain annotation http: //smart. emblheidelberg. de/ J. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http: //www. jcvi. org/cms /research/projects/tigrfa ms/overview/ Family functional classification http: //www. pantherdb. o rg/ TIGRFAM HMM Panther HMM Uni. S. California Sequence alignment PIRSF HMM PIR, Georgetown, Washington D. C. Sequence alignment Functional classification http: //pir. georgetown. e du/pirwww/dbinfo/pirsf. shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http: //www. bioinf. manc hester. ac. uk/dbbrowser/ PRINTS/index. php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http: //expasy. org/prosit e/ Sequence alignment Microbial protein family classification http: //expasy. org/sprot/ hamap/ HAMAP Introduction to Inter. Profiles SIB http: //www. ebi. ac. uk/interpro
A Closer look at Inter. Pro Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Foundations of Inter. Pro Integration of signatures Inter. Pro Manual curation Master headline Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Inter. Pro Curation Priniciples -To represent MDBs signatures as closely as possible to what they intended -To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping -To provide as much information to the end user as possible about the signature by annotating signatuires and providing links to other databases. Master headline Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Inter. Pro Entry Links related signatures Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Master headline Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) * * SMART (100) Protein kinase (75) Serine kinase PFAM (100) Protein kinase PFAM PROSITE SMART (25) Tyrosine kinase PFAM Protein kinase SMART Parent PROSITE Serine kinase Tyrosine kinase PROSITE Children No proteins in common Master headline Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro Applies to domains and families
The Inter. Pro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein PTM Active Site Binding Site Introduction to Inter. Pro Master headline http: //www. ebi. ac. uk/interpro Conserved Site
Searching Inter. Pro protein ID Paste in unknown sequence Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Inter. Pro Search Results Family Link to PDBe Domains and sites Unintegrated signatures Structural data Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Link to Inter. Pro entry Links to signature databases Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
https: //www. ebi. ac. uk/Tools/pfa/iprscan/ Select member databases Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
Caveats Inter. Pro entries are based on signatures supplied to us by our member databases • . . this means no signature, no entry! We need your feedback! missing/additional references reporting problems requests Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
ACKNOWLEDGEMENTS Inter. Pro Team: Sarah Hunter Phil Jones Siew-Yit Yong Alex Mitchell Amaia Sangrador Craig Mc. Anulla Matthew Maxim Sebastien Fraser Scheremetje Pesseat w Introduction to Inter. Pro http: //www. ebi. ac. uk/interpro
- Slides: 29