Homology and Homologs Homology just means sequence similarity

  • Slides: 9
Download presentation
Homology and Homologs Homology just means sequence similarity by virtue of a common evolutionary

Homology and Homologs Homology just means sequence similarity by virtue of a common evolutionary ancestor. >gi|24640218|ref|NP_572350. 2| CG 3126 -PA, isoform A [Drosophila melanogaster] Length=1571 Score = 427 bits (1098), Expect = 6 e-118 Identities = 223/415 (53%), Positives = 297/415 (71%), Gaps = 19/415 (4%) Frame = +2 Query 1901 SLVDHNEIMAKLTLKQEGDDGPDVRGGSGDILLVHATETDRKDLVLYFEAFLTTYRTFIT 2080 ++++ I L LK+ +DGP+V+GG D L+VHA+ + + EAF+TT+RTFI Sbjct 1151 NMLEEVNITRYLILKKREEDGPEVKGGYIDALIVHASRVQKVADNAFCEAFITTFRTFIQ 1210 Query 2081 PEELIQKLQYRYERF-CHFQDTFKQRVSKNTFFVLVRVVDELCLVEMTDEILKLLMELVF 2257 P ++I+KL +RY F C QD KQ+ +K TF +LVRVV++L ++T ++L LL+E V+ Sbjct 1211 PIDVIEKLTHRYTYFFCQVQDN-KQKAAKETFALLVRVVNDLTSTDLTSQLLSLLVEFVY 1269 Query 2258 RLVCKGELSLARILRKNILEKV---ENKRMLHHANS—-ALKPLAARGVAARPG------- 2401 +LVC G+L LA++LR +EKV + ++ + G+A G Sbjct 1270 QLVCSGQLYLAKLLRNKFVEKVTLYKEPKVYGFVGELGGAGSVGGAGIAGSGGCSGTAGG 1329 Query 2402 ----TLHDFHSLEIAEQLTLLDAELFYKIEIPEVLLWAKEQNEEKSPNLTQFTEHFNNMS 2569 +L D SLEIAEQ+TLLDAELF KIEIPEVLL+AK+Q EEKSPNL +FTEHFN MS Sbjct 1330 GNQPSLLDLKSLEIAEQMTLLDAELFTKIEIPEVLLFAKDQCEEKSPNLNKFTEHFNKMS 1389 Query 2570 YWVRSIIMLQEKAQDRERLLLKFIKIMKHLRKLNNFNSYLAILSALDSAPIRRLEWQKQT 2749 YW RS I+ + A++RE+ + KFIKIMKHLRK+NN+NSYLA+LSALDS PIRRLEWQK Sbjct 1390 YWARSKILRLQDAKEREKHVNKFIKIMKHLRKMNNYNSYLALLSALDSGPIRRLEWQKGI 1449 Query 2750 SEGLAEYCTLIDSSSSFRAYRAALAEVEPPCIPYLGLILQDLTFVHLGNPDHID-GKVNF 2926 +E + +C LIDSSSSFRAYR ALAE PPCIPY+GLILQDLTFVH+GN D++ G +NF Sbjct 1450 TEEVRSFCALIDSSSSFRAYRQALAETNPPCIPYIGLILQDLTFVHVGNQDYLSKGVINF 1509 Query 2927 SKRWQQFNILDSMRRFQQVHYEIRRNDEIISFFNDFSDHLAEEALWELSLKIKPR 3091 SKRWQQ+NI+D+M+RF++ Y RRN+ II FF++F D + EE +W++S KIKPR Sbjct 1510 SKRWQQYNIIDNMKRFKKCAYPFRRNERIIRFFDNFKDFMGEEEMWQISEKIKPR 1564 These two sequences, my Xenopus query sequence and the matching Drosophila sequence, show strong (and variable) homology, but even if we knew the function of the Drosophila gene it may not tell us much about the function of the Xenopus gene.

Genes and Evolution - I Gene duplication though speciation A The two copies of

Genes and Evolution - I Gene duplication though speciation A The two copies of Gene A will now evolve independently, but will continue to have the same function A A They are ORTHOLOGS

Genes and Evolution - II The two copies of Gene A will now evolve

Genes and Evolution - II The two copies of Gene A will now evolve independently, but will probably not continue to have exactly the same function Gene duplication though internal genome duplication A A A’ They are PARALOGS

Homologs, orthologs & paralogs http: //www. ncbi. nlm. nih. gov/Education/BLASTinfo/Orthology. html

Homologs, orthologs & paralogs http: //www. ncbi. nlm. nih. gov/Education/BLASTinfo/Orthology. html

Mutation and Evolution Translated part of m. RNA sequence Ancestral sequence ATGAAGGCTGCCTACGACTGCCGTGCCAGAATGCTGAGG In species

Mutation and Evolution Translated part of m. RNA sequence Ancestral sequence ATGAAGGCTGCCTACGACTGCCGTGCCAGAATGCTGAGG In species A ATGAAGGCTGCCTATGACTGCCGTGCCAGAATGCTGAGG ATGAATGCTGCCTATGACTGCCGTGCCAGAATGCTAAGG ATGAATGCTGCCTATGACTGCCGTG GAATGCTAAGG ATGAATGCAGCCTATGATTGCCGTG GAATGCTAAGG ATGAATGCAGCCTATGATTGCCGAG GAATGCTAAGG In species B ATGAAGGCTGCCTACGACTGCCGTGCCATAATGCTGAGG ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGG ATGAAGGCCGCCTACGACTGTCGTGCCATAATGCTGAGA ATGAAGGCCGCCTACGACTGTCGTGCCATAATCCTGAGA ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG ||||| || | || | | ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA MKAAYDCRARMLR MNAAYDCRARMLR MNAAYDCR GMLR MKAAYDCRAIMLR MKAAYDCRAIILR MNAAYDCR-GMLR | |||||| +|| MKAAYDCRAIILR

Searching for Similarity DNA comparison ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG ||||| || | || | | ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA amino

Searching for Similarity DNA comparison ATGAATGCAGCCTATGATTGCCGAG---GAATGCTAAGG ||||| || | || | | ATGAAGGCCGCATACGACTGTCGTGCCATAATCCTGAGA amino acid comparison MNAAYDCR-GMLR | |||||| +|| MKAAYDCRAIILR The DNA sequence can change while the amino acid sequence stays the same, so always look for similarities by comparing amino acid sequences. We note that evolution causes sequence to change, by substitution, insertion or deletion, but not usually by small-scale re-ordering. So we need a tool which will find the ‘alignment’ between the two sequences which shows the greatest degree of similarity while introducing the fewest gaps as possible.

The Downside of Gaps Take two random sequences, with no ‘real’ similarity: GACACTAGGTCGATGCGTGGTGGCGAGA ACGCATCCGGATGTGCACCGTGGAACTG

The Downside of Gaps Take two random sequences, with no ‘real’ similarity: GACACTAGGTCGATGCGTGGTGGCGAGA ACGCATCCGGATGTGCACCGTGGAACTG And allow cost free gaps: GAC--ACT----AGGTCGATGC---GTGG---TGGCGAGA || | | ||| || ACGCA-TCCGGA--T-G-TGCACCGTGGAACTG Clearly, although the alignment has no mismatches, it is obviously not biologically meaningful! The introduction of gaps into alignments must ideally reflect biological possibilities, but this is rather difficult. So the tendency is to make gaps ‘expensive’, and introduced only when they make more long range matching happen than they introduce ‘un’-matching, e. g. TTCCCAACTCTCCTCTTTCACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA | || |||||||||| ||| | TTCCCACCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTCCCCCAAAATCAAGCGCACCCCGTCCCAGAA TTCCCAACTCTCCTCTTT=CACCATGAAGCTCAAGGACAGATTCCACTCGCCCCAAAATCAAGCTCACCCCGTCCAAGAA ||||||||||| |||||||||||||| |||| TTCCCACCTCTTTGCACCATGAAGCTCAAGGACAAATTCCACTC =CCCCAAAATCAAGCGCACCCCGTCCCAGAA

The Essential Task Basically what we are trying to do, is to see whether

The Essential Task Basically what we are trying to do, is to see whether we can work out the function of an unknown gene by comparing its sequence with those of genes in other species where we already know the function. We can do this because the sequence of most genes is conserved to some extent during evolution of different species. The problem is that while gene function is probably related to both its overall three-dimensional structure and small regions of specific linear sequence, our only serious tool for discerning similarity between proteins is based firmly on long range linear sequence similarity. And there is no obvious requirement on genes to conserve sequence in order to conserve function – it’s just easier that way… But it seems clear that we can only expect this to be effective if we are looking at true ORTHOLOGS.

Finding Orthologs So how do we find orthologs, and can we know when we

Finding Orthologs So how do we find orthologs, and can we know when we have? The simplest is Reciprocal Best BLAST, but it implicitly relies on having all the protein sequences of you own organism, and the one you wish to find an ortholog in. frog protein database of human proteins best match human protein database of frog proteins x