Introduction to Phylogenies for immunologists 2013 Dr Laura

  • Slides: 47
Download presentation
Introduction to Phylogenies for immunologists 2013 Dr Laura Emery Laura. Emery@ebi. ac. uk www.

Introduction to Phylogenies for immunologists 2013 Dr Laura Emery Laura. Emery@ebi. ac. uk www. ebi. ac. uk/training

Objectives After this tutorial you should be able to… • Use essential phylogenetic terminology

Objectives After this tutorial you should be able to… • Use essential phylogenetic terminology effectively • Discuss aspects of phylogenies and their implications for phylogenetic interpretation • Apply phylogenetic principles to interpret simple trees This course will not: • Provide you with an overview of phylogenetic methods • Enable you to use tools to construct your own phylogenies • Enable you to evaluate whether a sensible phylogenetic model or method was selected to construct a phylogeny

Outline • Introduction • Aspects of a tree 1. Topology 2. Branch lengths 3.

Outline • Introduction • Aspects of a tree 1. Topology 2. Branch lengths 3. Nodes 4. Confidence • Simple phylogenetic interpretation • Including homology, gene duplication, co-evolution

What can I do with phylogenetics? • Deduce relationships among species or genes or

What can I do with phylogenetics? • Deduce relationships among species or genes or cells • Deduce the origin of pathogens • Identify biological processes that affect how your sequence has evolved e. g. identify genes or residues undergoing positive selection • Explore the evolution of traits through history • Estimate the timing of major historical events • Explore the impact of geography on species diversification

What is a phylogenetic tree? Darwin 1837 A tree is an explanation of how

What is a phylogenetic tree? Darwin 1837 A tree is an explanation of how sequences evolved, their genealogical relationships and thus how they came to be the way they are today (or at the time of sampling).

Phylogenies explain genealogical relationships • Family tree

Phylogenies explain genealogical relationships • Family tree

Aspects of a tree 1. Topology (branching order) 2. Branch lengths (indication of genetic

Aspects of a tree 1. Topology (branching order) 2. Branch lengths (indication of genetic change) 3. Nodes i. Tips (sampled sequences known as taxa) ii. Internal nodes (hypothetical ancestors) iii. Root (oldest point on the tree) 4. Confidence (bootstraps/probabilities)

1. Topology The topology describes the branching structure of the tree, which indicate patterns

1. Topology The topology describes the branching structure of the tree, which indicate patterns of relatedness. These trees display the same topology These trees display different topologies A A B B C C B C A A C C B A B B A C

Topology Question Are these topologies the same? Answer = yes

Topology Question Are these topologies the same? Answer = yes

Topology Question II Which of these trees has a different topology from the others?

Topology Question II Which of these trees has a different topology from the others? F E D A B C F D E F C B C A B A E D C B A F E D B A C E D F

2. Branch lengths indicate genetic change 0. 8 1. 2 0. 5 0. 6

2. Branch lengths indicate genetic change 0. 8 1. 2 0. 5 0. 6 0. 5 • Longer branches indicate greater change • Change is typically represented in units of number of substitutions per site (but check the legend)

A scale bar can represent branch lengths 0. 8 1. 2 0. 5 0.

A scale bar can represent branch lengths 0. 8 1. 2 0. 5 0. 6 0. 5 These are alternative representations of the same phylogeny

Alternative representations of phylogenies All of these representations depict the same topology Branch lengths

Alternative representations of phylogenies All of these representations depict the same topology Branch lengths are indicated in blue Red lengths are meaningless

Not all trees include branch length data Cladogram Phylogram

Not all trees include branch length data Cladogram Phylogram

Distance and substitution rate are confounded • Branch lengths indicate the genetic change that

Distance and substitution rate are confounded • Branch lengths indicate the genetic change that has occurred • We often don’t know if long branch lengths reflect: • A rapid evolutionary rate • An ancient divergence time A • A combination of both • Genetic change = Evolutionary rate B C E D x Divergence time (substitutions/site) (substitutions/site/year) (years)

3. Nodes A B C D E • Nodes occur at the ends of

3. Nodes A B C D E • Nodes occur at the ends of branches • There are three types of nodes: i. Tips (sampled sequences known as taxa) ii. Internal nodes (hypothetical ancestors) iii. Root (oldest point on the tree) Figures Andrew Rambaut

The root is the oldest point on the tree present A B C D

The root is the oldest point on the tree present A B C D E past • The root indicates the direction of evolution • It is also the (hypothesised) most recent common ancestor (MRCA) of all of the samples in the tree Figures Andrew Rambaut

Trees can be drawn in an unrooted form Rooted A B C D Unrooted

Trees can be drawn in an unrooted form Rooted A B C D Unrooted E A D B E C These are alternative representations of the same topology

There are multiple rooted tree topologies for any given unrooted tree * • Most

There are multiple rooted tree topologies for any given unrooted tree * • Most tree-building methods produce unrooted trees • Identifying the correct root is often critical for interpretation! Figure Aiden Budd

How to root a tree Midpoint rooted • Midpoint rooting • Assume constant evolutionary

How to root a tree Midpoint rooted • Midpoint rooting • Assume constant evolutionary rate Unrooted • Often not the case! • Outgroup rooting Outgroup rooted • The outgroup is one or more taxa that are known to have diverged prior to the group being studied • The node where the outgroup lineage joins the other taxa is the root Recommended

Root Question This tree shows a cladogram i. e. the branch lengths do not

Root Question This tree shows a cladogram i. e. the branch lengths do not indicate genetic change. Indicate any root positions where bird and crocodile are not sister taxa (each other's closest relatives).

Alternative Representations Question

Alternative Representations Question

4. Confidence How good is a tree? A tree is a collection of hypotheses

4. Confidence How good is a tree? A tree is a collection of hypotheses so we assess our confidence in each of its parts or branches independently 0. 99 100 0. 81 63 0. 93 85 There are three main approaches: • Bootstraps • Bayesian methods • Approximate likelihood ratio test (a. LRT) methods probabilistic

What is a monophyletic group? A monophyletic group (also described as a clade) is

What is a monophyletic group? A monophyletic group (also described as a clade) is a group of taxa that share a more recent common ancestor with each other than to any other taxa. monophyletic group

Confidence Question Which of the bootstrap values indicates our confidence in the grouping of

Confidence Question Which of the bootstrap values indicates our confidence in the grouping of A, B, C, and D together as a monophyletic group? Do you think we can be confident in this grouping? 100 91 63 84 A B C D E F Note: high bootstrap values do not always mean that we have confidence in a branch. False confidence can be generated under some phylogenetic methods

Part two: Phylogenetic interpretation for immunologists 2013 Dr Laura Emery Laura. Emery@ebi. ac. uk

Part two: Phylogenetic interpretation for immunologists 2013 Dr Laura Emery Laura. Emery@ebi. ac. uk www. ebi. ac. uk/training

Phylogenetic interpretation skill set 1. Tree-thinking skills • relatedness, confidence, homology 2. Knowledge of

Phylogenetic interpretation skill set 1. Tree-thinking skills • relatedness, confidence, homology 2. Knowledge of phylogenetic methods and their limitations 3. Knowledge of biological processes affecting sequence evolution • gene duplication, recombination, horizontal gene transfer, population genetic processes, and many more! 4. Knowledge of the data you wish to interpret

Simple phylogenetic interpretation question • Which is true? • A) Mouse is more closely

Simple phylogenetic interpretation question • Which is true? • A) Mouse is more closely related to fish than frog is to fish • B) Lizard is more closely related to fish than mouse is to fish • C) Human and frog are equally related to fish

Homology is similarity due to shared ancestry Example: limbs and wings • Limbs are

Homology is similarity due to shared ancestry Example: limbs and wings • Limbs are homologous they share a common ancestor • Wings are not homologous they are an analogous as they have evolved similarity independently

Gene duplication and subsequent divergence can result in novel gene functions (it can also

Gene duplication and subsequent divergence can result in novel gene functions (it can also result in pseudogenes) • Genes that are homologous due to gene duplication are paralogous • Genes that are homologous due to speciation are orthologous

Teleost MHC class II phylogeny • Can you spot any MHC class II gene

Teleost MHC class II phylogeny • Can you spot any MHC class II gene duplication events? Harstad et al BMC Genomics 2008

Immunology related genes have atypical patterns of molecular evolution • Immunology genes have a

Immunology related genes have atypical patterns of molecular evolution • Immunology genes have a high d. N/d. S ratio indicative of positive selection • Rapid evolutionary rate • Difficult to align • Violate assumptions of many phylogenetic models Park et al 2012. Scientific Reports

Positive selection can lead to ladder-like phylogenies

Positive selection can lead to ladder-like phylogenies

Example: influenza haemagglutination phylogeny and immunological mapping Smith et al 2004. Science

Example: influenza haemagglutination phylogeny and immunological mapping Smith et al 2004. Science

Phylogenetics can inform us of hostpathogen interactions and co-evolution • "Mirror" phylogenies are indicative

Phylogenetics can inform us of hostpathogen interactions and co-evolution • "Mirror" phylogenies are indicative of host-parasite vertical inheritance Jiggins web page: http: //www. gen. cam. ac. uk/research/jiggins/research. html

What does this phylogeny tell us about Human Cytomegalovirus (HCMV)? Baboon Simian Rhesus Chimp

What does this phylogeny tell us about Human Cytomegalovirus (HCMV)? Baboon Simian Rhesus Chimp Human Rat Murine Nicholson et al 2009. Virol J

T-cell receptors and immunoglobulin chains are homologous Richards et al 2000

T-cell receptors and immunoglobulin chains are homologous Richards et al 2000

An extremely brief introduction to methods, analyses, & pitfalls

An extremely brief introduction to methods, analyses, & pitfalls

There is only one true tree • The true tree refers to what actually

There is only one true tree • The true tree refers to what actually happened in the evolutionary past • All methods attempt to reconstruct the true phylogeny • Even the best method may not give you the true tree

Phylogenetic Methods: The general approach • We want to find the tree that best

Phylogenetic Methods: The general approach • We want to find the tree that best explains our aligned sequences • We need to be able to define “best explains” • we need a model of sequence evolution • we need a criterion (or set of criteria) to use to choose between alternative trees • then evaluate all possible trees (NB: if N=20, then 2 x 1020 possible unrooted trees!) • or take a short cut Paul Sharp

The problem of multiple substitutions * G * A A * hidden mutations A

The problem of multiple substitutions * G * A A * hidden mutations A * T • More likely to have occurred between distantly related species • > We need an explicit model of evolution to account for these

Methodological approaches 1. Distance matrix methods (pre-computed distances) • UPGMA assumes perfect molecular clock

Methodological approaches 1. Distance matrix methods (pre-computed distances) • UPGMA assumes perfect molecular clock Sokal & Michener (1958) • Minimum evolution (e. g. Neighbor-joining, NJ) Saitou & Nei (1987) 2. Maximum parsimony Fitch (1971) • Minimises number of mutational steps 3. Maximum likelihood, ML • Evaluates statistical likelihood of alternative trees, based on an explicit model of substitution 4. Bayesian methods • Like ML but can incorporate prior knowledge

Phylogenetic analyses are not straightforward Data assessment - known biology - additional data (e.

Phylogenetic analyses are not straightforward Data assessment - known biology - additional data (e. g. geography) Investigate unexpected and unresolved aspects further - consider including more data Decide upon and implement method Formulate hypotheses No No Yes Can you validate this? Phylogeneti c Result(s) Answere d your question? Yes Final phylogeny and analysis

Further Reading • Molecular Evolution: A Phylogenetic Approach (1998) Roderic D M Page &

Further Reading • Molecular Evolution: A Phylogenetic Approach (1998) Roderic D M Page & Edward C Holmes, Blackwell Science, Oxford. • The Phylogenetic Handbook (2003), Marco Salemi and Anne-Mieke Vandamme Eds, Cambridge University Press, Cambridge. • Inferring Phylogenies (2003) Joseph Felsenstein, Sinauer. • Molecular Evolution (1997) Wen-Hsiung Li , Sinauer

Phylogenetics at the EBI • Clustal phylogeny currently available • RAx. ML coming soon…

Phylogenetics at the EBI • Clustal phylogeny currently available • RAx. ML coming soon… • www. EBI. ac. uk/tools/phylogeny

Acknowledgements People • Andrew Rambaut (University of Edinburgh) team • Paul Sharp (University of

Acknowledgements People • Andrew Rambaut (University of Edinburgh) team • Paul Sharp (University of Edinburgh) • Nick Goldman (EMBL-EBI) • Benjamin Redelings (Duke University) • Brian Moore (University of California, Davis) • Olivier Gascuel (University of Montpelier) • Aiden Budd (EMBL-Heidelberg) Funding EMBL member states and… …and the EBI training

Thank you! www. ebi. ac. uk Twitter: @emblebi Facebook: EMBLEBI

Thank you! www. ebi. ac. uk Twitter: @emblebi Facebook: EMBLEBI