An Introduction to Phylogenetics Anton E Weisstein Sequence

  • Slides: 30
Download presentation
An Introduction to Phylogenetics Anton E. Weisstein > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2

An Introduction to Phylogenetics Anton E. Weisstein > Sequence 1 GAGGTAGTAATTAGATCCGAAA… > Sequence 2 GAGGTAGTAATTAGATCTGAAA… > Sequence 3 GAGGTAGTAATTAGATCTGTCA… Indiana State University March 11 -14, 2004

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

What is phylogenetics? Phylogenetics is the study of evolutionary relationships. Relationships among species: birds

What is phylogenetics? Phylogenetics is the study of evolutionary relationships. Relationships among species: birds rodents snakes primates crocodiles marsupials lizards

What is phylogenetics? Relationships among species: crocodiles birds lizards snakes rodents primates marsupials This

What is phylogenetics? Relationships among species: crocodiles birds lizards snakes rodents primates marsupials This is an example of a phylogenetic tree.

What is phylogenetics? Relationships within species: HIV subtypes B Italy A Rwanda Ivory Coast

What is phylogenetics? Relationships within species: HIV subtypes B Italy A Rwanda Ivory Coast Uganda U. S. India Rwanda C U. K. D Ethiopia Uganda S. Africa Netherlands Tanzania Romania Cameroon F Brazil Russia Taiwan Netherlands G

So what is phylogenetics good for? Phylogenetics has direct applications to: • Conservation: test

So what is phylogenetics good for? Phylogenetics has direct applications to: • Conservation: test wood, ivory, meat products for poaching • Agriculture: analyze specific differences between cultivars • Forensics: DNA fingerprinting • Medicine: determine specific biochemical function of cancer -causing genes

HIV Example 1: Florida dentist case 1990 case: Did a patient’s HIV infection result

HIV Example 1: Florida dentist case 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist?

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Phylogenetic concepts: Interpreting a Phylogeny Sequence A Sequence B Sequence C Sequence D Sequence

Phylogenetic concepts: Interpreting a Phylogeny Sequence A Sequence B Sequence C Sequence D Sequence E Time Which sequence is most closely related to B? A, because B diverged from A more recently than from any other sequence. Physical position in tree is not meaningful! Only tree structure matters.

Phylogenetic concepts: Rooted and Unrooted Trees A B Root A B X = Root

Phylogenetic concepts: Rooted and Unrooted Trees A B Root A B X = Root A ? ? B ? = X ? C ? D Time C D C ? D

How Many Trees? Unrooted trees # # pairwise sequences distances Rooted trees # branches

How Many Trees? Unrooted trees # # pairwise sequences distances Rooted trees # branches /tree # trees 3 3 1 3 3 4 4 6 3 5 15 6 5 10 15 7 105 8 6 15 105 9 945 10 10 45 2, 027, 025 17 34, 459, 425 18 30 435 8. 69 1036 57 4. 95 1038 58 N N (N - 1) 2 (2 N - 5)! 2 N - 3 (N - 3)! 2 N - 3 (2 N - 3)! 2 N - 2 (N - 2)! 2 N - 2

Tree Types Evolutionary trees measure time. Phylograms measure change. sharks seahorses frogs Root 50

Tree Types Evolutionary trees measure time. Phylograms measure change. sharks seahorses frogs Root 50 million years Root owls crocodiles armadillos bats seahorses frogs owls crocodiles armadillos 5% change bats

Tree Properties Ultrametricity Additivity All tips are an equal distance from the root. X

Tree Properties Ultrametricity Additivity All tips are an equal distance from the root. X Distance between any two tips equals the total branch length between them. a Root b c d e Y a=b+c+d+e a X b Root c e d XY = a + b + c + d + e In simple scenarios, evolutionary trees are ultrametric and phylograms are additive. Y

Tree Building Exercise Using the distance matrix given, construct an ultrametric tree. X a

Tree Building Exercise Using the distance matrix given, construct an ultrametric tree. X a Ultrametricity All tips are an equal distance from the root. Root b c a=b+c+d+e d e Y

Phylogenetic Methods Many different procedures exist. Three of the most popular: Neighbor-joining • Minimizes

Phylogenetic Methods Many different procedures exist. Three of the most popular: Neighbor-joining • Minimizes distance between nearest neighbors Maximum parsimony • Minimizes total evolutionary change Maximum likelihood • Maximizes likelihood of observed data

Comparison of Methods Neighbor-joining Uses only pairwise distances Maximum parsimony Uses only shared derived

Comparison of Methods Neighbor-joining Uses only pairwise distances Maximum parsimony Uses only shared derived characters Maximum likelihood Uses all data Minimizes distance Minimizes total between nearest neighbors distance Maximizes tree likelihood given specific parameter values Very fast Slow Very slow Easily trapped in local optima Assumptions fail when Highly dependent on evolution is rapid assumed evolution model Good for generating tentative tree, or choosing among multiple trees Best option when tractable (<30 taxa, homoplasy rare) Good for very small data sets and for testing trees built using other methods

Which procedure should we use? Neighborjoining Maximum parsimony ? Maximum likelihood All that we

Which procedure should we use? Neighborjoining Maximum parsimony ? Maximum likelihood All that we can! • Each method has its own strengths • Use multiple methods for cross-validation • In some cases, none of the three gives the correct phylogeny!

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Phylogenetic concepts: Homology and Homoplasy Homology: identical character due to shared ancestry (evolutionary signal)

Phylogenetic concepts: Homology and Homoplasy Homology: identical character due to shared ancestry (evolutionary signal) Homoplasy: identical character due to evolutionary convergence or reversal (evolutionary noise) +flight lizards birds snakes +hair rodents primates Homology snakes rodents bats +flight Homoplasy (Convergence) worms lizards snakes +legs –legs Homoplasy (Reversal)

Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate

Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock. 2002 2001 2000 2001 % genetic difference 2002

Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate

Watching the Molecular Clock Mutation occurs as a random (Poisson) process. If mutations accumulate at a constant rate over time and across all branches, the phylogeny is said to obey a molecular clock. BUT: • Natural selection favors some mutations and eliminates others • Selection varies over time and across lineages 2002 2001 2000 2001 % genetic difference 2002

Trees are hypotheses about evolutionary history So far, we’ve looked at understanding and formulating

Trees are hypotheses about evolutionary history So far, we’ve looked at understanding and formulating these hypotheses. Now, let’s turn our attention to testing them.

Tree Testing: Split Decomposition Split decomposition is one method for testing a tree. Under

Tree Testing: Split Decomposition Split decomposition is one method for testing a tree. Under this procedure, we choose exactly four taxa (A, B, C, D) and examine the topologies of all possible unrooted trees. How many such trees are there? A C A B D C D D Only one of these topologies is right. How can we quantitatively assess the support for each tree? B C

Tree Testing: Split Decomposition The correct tree should be approximately additive; the others usually

Tree Testing: Split Decomposition The correct tree should be approximately additive; the others usually will not. For each tree, we calculate split indices that estimate the length of the internal branch: A + B D C – A C 2 + if B D = A B C D is the right phylogeny! Large split indices Long internal branch Topology strongly supported Small split indices Short internal branch Topology weakly supported Negative split indices Biologically impossible Topology probably wrong

Tree Testing: Bootstrapping Used to assess the support for individual branches Randomly resample characters,

Tree Testing: Bootstrapping Used to assess the support for individual branches Randomly resample characters, with replacement Repeat many times (1000 or more) How often does a specific branch appear? 100 73 98 rat human turtle fruit fly oak duckweed

Tree Testing: Bootstrapping Mac. Clade Example: Vertebrate evolution

Tree Testing: Bootstrapping Mac. Clade Example: Vertebrate evolution

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

Outline I. Overview II. Building and Interpreting Phylogenies III. Evolutionary Inference IV. Specific Applications

HIV Example 1: Florida dentist case • 1990 case: Did a patient’s HIV infection

HIV Example 1: Florida dentist case • 1990 case: Did a patient’s HIV infection result from an invasive dental procedure performed by an HIV+ dentist? • HIV evolves so fast that transmission patterns can be reconstructed from viral sequence (molecular forensics). • Compared viral sequence from the dentist, three of his HIV+ patients, and two HIV+ local controls.

Florida dentist case

Florida dentist case

So what do the results mean? • 2 of 3 patients closer to dentist

So what do the results mean? • 2 of 3 patients closer to dentist than to local controls. Statistical significance? More powerful analyses? • Do we have enough data to be confident in our conclusions? What additional data would help? • If we determine that the dentist’s virus is linked to those of patients E and G, what are possible interpretations of this pattern? How could we test between them?