DNA Motif and protein domain discovery Presented by
DNA Motif and protein domain discovery Presented by: Deeter Neumann Peter St. Andre PDB; zinc finger 224 PDB; human enhancer binding protein
Outline What are DNA motifs & proteins domains? Their importance and function motif algorithms locating domain/motif experimentally available programs: PFAM & SMART Taken fromwikimedia. org
What are DNA sequence motifs? “Sequence motifs are short recurring patterns in DNA that are presumed to have biological function. ” D’haeseleer, P. Nature Biotechnology 24, 423 - 425 (2006). Image taken from bio. miami. edu
Why are DNA sequence motifs important to know? Indicates common structural protein domains Identifies similar function Other possible biological functions, eg. transcription factors, m. RNA processing
What is the function of DNA domains? specific and non-specific interactions permits binding of transcription factor to target gene sequence-specific recognition Human Molecular Genetics 3; Strachan & Read
What are protein domains? Protein sequences and structures that evolve, function, and exist independently from the rest of the protein Image of human zinc finger domain They often form functional units, like metal binding domains Taken from. ionchannels. org
Why are Proteins Domains Important? Bind to other molecules in the cell Signal transduction pathways Genetically engineering novel proteins Pharmaceutical importance 7
Algorithmic Approaches for both DNA motifs and protein domain searches Three general approaches are used: Enumeration Deterministic optimization Probabilistic optimization
Enumeration Employs the broadest approach Looks at all possible motifs Few limitations are enacted on it
Enumeration, cont. Key point: Covers all possible sequence motifs with few limitations Pros: Does not get stuck in local optimum Cons: May overlook subtle patterns Programs like Weeder. Web and YMF use these type of algorithms
Weeder. Web
Weeder. Web Results
Deterministic optimization Takes into account an Expectation Maximization model and a position weight matrix MEME is one program that uses this approach What does this mean?
Deteriministic optimization, cont.
Deterministic optimization, cont. Taken from ws. nbcr. net/app 1234127263839/meme. html
Probabilistic optimization Uses a Gibbs sampling approach – Randomized implementation of expectation maximization model How is this applied?
Probabilistic optimization, cont. Selects random sites and each is weighted against known motifs Allows program to add or remove sequences and continuously update motifs
Align. Ace 3. 0 18
Results 19
Which one to use? Recent research showed that enumeration approaches worked very well Generally accepted that no one approach is the best Programs that incorporate several approaches work the best Important to rerun programs
Examples of programs Weeder. Web is a web-based interface with an enumerative approach YMF is another enumerative program MEME is an online program that uses a deterministic optimization approach Motif. Sampler is a program that combines Gibbs sampling and a third order Markov model
YMF
YMF results
Measurements used to score sequence motifs Three main statistics used: Information content Log likelihood MAP score
Other measures of motif quality Group specificity, or site specificity • Probability of having a certain number of target sequences with the site in question Sequence specificity • Accounts for both number of sequences with the sites in question and the number of sites per sequence Positional bias, or uniformity • Looks at how uniform of the sites in question are distribute with respect to transcription start sites of the gene
Identification and preliminary characterization of a protein motif related to the zinc finger Lovering et al. (1993)
What is a zinc finger? autonomously folding domain structural motif zinc required for folding and DNA interactions PDB; single zinc finger in solution part of protein that is used to regulate DNA
Classic zinc finger conserved cysteines and histidines binds with zinc Tetrahedral structure antiparallel two-stranded β -sheets and an α-helix image from wikipedia
Figure 1 A Lovering et al.
Actual RING 1 sequence MTTPANAQNASKTWELSLYELHRTPQEAIMDGTEIAVSPRSLHSELMCPICLDMLK NTMTTKECLHRFCSDCIVTALRSGNKECPTCRKKLVSKRSLRPDPNFDALISKIYP SREEYEAHQDRVLIRLSRLHNQQALSSSIEEGLRMQAMHRAQRVRRPIPGSDQTT TMSGGEGEPGEGEGDGEDVSSDSAPGPAPKRPRGGGAGGSSVGTGGG GTGGVGGGAGSEDSGDRGGTLGPPSPPGAPSPPEPGGEIELVFRPHPLL VEKGEYCQTRYVKTTGNATVDHLSKYLALRIALERRQQQEAGEPGGPGGGASDT GGPDGCGGEGGGAGGGDGPEEPALPSLEGVSEKQYTIYIAPGGGAFTTLNGSLT LELVNEKFWKVSRPLELCYAPTKDPK
RING finger Cys 1 -Xaa-hydrophobic aa-Cys 2 -Xaa 9 -27 -Cys 3 -Xaa 1 -3 -His. Xaa-hydrophobic aa-Cys 4 -Xaa 2 -Cys 5 -hydrophobic aa. Xaa 5 -47 -Cys 6 -Xaa 2 -Cys 7
Figure 1 B Fig. 1 B Lovering et al. Gene expression similar in variety of cell lines
Figure 2 DNA binding regulation recombination repair Lovering et al.
RING 1 peptide 55 aa synthetic peptide (residues 12 -66 in RING 1 seq) RING finger metal binding ---> prefers Zinc cobalt cadmium copper
S-C 0(II) Figure 3 A ___ cobalt ----- zinc Co(II) d-d transitions Fig. 3 A Lovering et al.
Figure 4 A Zinc dependence binding
RING 1 function 1992 No known function (not published until 1993) 2004 Inhibit transactivation of recombination signal binding protein-J (RBP-J) (Hongyan et al. ) 1. Ubiquitin-protein ligases
Pfam database http: //pfam. sanger. ac. uk/ Database that contains large collection of protein domains and families Represented as sequence alignments and HMMs List of key features about protein New interface that combined other Pfam versions New updates have made it more user-friendly
Pfam search of RING 1
Pfam search
Pfam search results
Pfam search results
Pfam link out
HMM logo of sequence motif
SMART http: //smart. embl-heidelberg. de/ Multiple sequence alignment of members >400 domains in >54, 000 different proteins Searches database using HMMs
SMART 2 different modes normal swiss-Prot SP-Tr. EMBL ensemble genomic proteomes of sequenced genomes
SMART
SMART
SMART
SMART
SMART
SMART 52
SMART
More motif madness 54
PRINTS 55
PRINTS 56
PROSITE 57
PROSITE 58
Questions? 59
How primitive is this RING-finger motif? The author only discusses genes containing this motif that come from eukaryotes. Is this motif found in prokaryotes as well? 60
- Slides: 60