Maximum Parsimony MP Algorithm MP Algorithm q Characterbased
Maximum Parsimony (MP) Algorithm
MP Algorithm q Character-based algorithm – does not use distances, but utilizes the character information in sequences q A criticism of distance-based methods is that they do not exploit the structure of the sequences (collapse them to a number – the distance) q Main philosophy is “economy of substitutions” – find the tree that requires the fewest mutations (maximum parsimony)
MP Algorithm q The strategy Ø Ø q explore a number of possible trees report the tree with smallest score (most parsimonious) Need to be able to solve two problems Ø small parsimony problem -- given a candidate tree compute its parsimony score Ø large parsimony problem -- generate efficiently viable candidate trees (cannot generate all – tree explosion)
Small Parsimony Problem q Given a candidate tree, compute its parsimony score q Consider a candidate tree for one-site sequences s 1 = A s 2 = T s 3 = T s 4 = G s 5 = A A G T T A Final Score = 3 A G T T G A
Solving Small Parsimony Problem q q explore the tree bottom-up (from leaves to interior) for each internal node one level up Ø if the “labels” at the two child nodes have no symbols in common assign as label at this node the sum of both labels A G penalize the tree one unit C A G Ø C G if the “labels” at the two child nodes do have symbols in common, label with common portion no penalty A G G T
Solving Small Parsimony Problem q For n-site sequences run the algorithm in parallel for each site and add up the parsimony scores for all sites q Consider a candidate tree for the following sequences s 1 = ATC s 2 = ACC s 3 = GTA s 4 = GCA C A T T A A T G C A Final Score = 4 T C C ATC ACC GTA GCA
Solving Large Parsimony Problem q Generate efficiently viable candidate trees (cannot try all) q Branch-and-bound approach Ø Ø create a possible tree by some method; calculate its score start building a tree from scratch; discarding trees that cost more than current best
Solving Large Parsimony Problem q Branch-and-bound approach http: //artedi. ebc. uu. se/course/X 3 -2004/Phylogeny-Tree. Search/Phylogeny-Search. html
MP Summary q Character-based algorithm – uses the sequence data q Produces unrooted trees q Economy of substitution – best tree is one that requires fewest number of substitutions q Examines a number of possible trees in search for best one
- Slides: 9