http tandy cs illinois eduCS 581 firstday pptx
- Slides: 28
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Phylogeny (evolutionary tree) Orangutan From the Tree of the Life Website, University of Arizona Gorilla Chimpanzee Human
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Phylogeny + genomics = genome-scale phylogeny estimation.
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Estimating the Tree of Life Basic Biology: How did life evolve? Applications of phylogenies to: protein structure and function population genetics human migrations metagenomics Figure from https: //en. wikipedia. org/wiki/Common_descent
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Estimating the Tree of Life Large datasets! Millions of species thousands of genes NP-hard optimization problems Exact solutions infeasible Approximation algorithms Heuristics Multiple optima Figure from https: //en. wikipedia. org/wiki/Common_descent High Performance Computing: necessary but not sufficient
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Muir, 2016
Computer Science Solving Problems in Biology and Linguistics • Algorithm design using – – Divide-and-conquer Iteration Heuristic search Graph theory • Algorithm analysis using – Probability Theory – Graph Theory • Simulations and modelling • Collaborations with biologists and linguists and data analysis • Discoveries about how life evolved on earth (and how languages evolved, too) http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Computational Phylogenetics (2005) Current methods can use months to estimate trees on 1000 DNA sequences Our objective: More accurate trees and alignments on 500, 000 sequences in under a week Courtesy of the Tree of Life web project, tolweb. org
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Computational Phylogenetics (2018) 1997 -2001: Distance-based phylogenetic tree estimation from polynomial length sequences 2012: Computing accurate trees (almost) without multiple sequence alignments 2009 -2015: Co-estimation of multiple sequence alignments and gene trees, now on 1, 000 sequences in under two weeks 2014 -2015: Species tree estimation from whole genomes in the presence of massive gene tree heterogeneity Courtesy of the Tree of Life web project, tolweb. org 2016 -2017: Scaling methods to very large heterogeneous datasets using novel machine learning and supertree methods.
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx c The Tree of Life: Multiple Challenges Scientific challenges: • • • Ultra-large multiple-sequence alignment Gene tree estimation Metagenomic classification Alignment-free phylogeny estimation Supertree estimation Estimating species trees from many gene trees Genome rearrangement phylogeny Reticulate evolution Visualization of large trees and alignments Data mining techniques to explore multiple optima Theoretical guarantees under Markov models of evolution Techniques: applied probability theory, graph theory, supercomputing, and heuristics Testing: simulations and real data
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx DNA Sequence Evolution -3 mil yrs AAGACTT AAGGCCT AGGGCAT TAGCCCA -2 mil yrs TGGACTT TAGACTT AGCACAA AGCGCTT -1 mil yrs today
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Gene Tree Estimation U AGGGCAT V W TAGCCCA X TAGACTT Y TGCACAA X U Y V W TGCGCTT
Quantifying Error FN FN: false negative (missing edge) FP: false positive (incorrect edge) FP 50% error rate http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Distance-based estimation
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Indels (insertions and deletions) Deletion Mutation …ACGGTGCAGTTACCA… …ACCAGTCACCA…
Deletion Substitution …ACGGTGCAGTTACCA… Insertion …ACCAGTCACCTA… …ACGGTGCAGTTACC-A… …AC----CAGTCACCTA… The true multiple alignment – Reflects historical substitution, insertion, and deletion events – Defined using transitive closure of pairwise alignments computed on edges of the true tree http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Gene Tree Estimation S 1 S 2 S 3 S 4 = = AGGCTATCACCTGACCTCCA TAGCTATCACGACCGC TAGCTGACCGC TCACGACA
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Input: unaligned sequences S 1 S 2 S 3 S 4 = = AGGCTATCACCTGACCTCCA TAGCTATCACGACCGC TAGCTGACCGC TCACGACA
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Phase 1: Alignment S 1 S 2 S 3 S 4 = = AGGCTATCACCTGACCTCCA TAGCTATCACGACCGC TAGCTGACCGC TCACGACA S 1 S 2 S 3 S 4 = = -AGGCTATCACCTGACCTCCA TAG-CTATCAC--GACCGC-TAG-CT-------GACCGC----TCAC--GACCGACA
http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx Phase 2: Construct tree S 1 S 2 S 3 S 4 = = AGGCTATCACCTGACCTCCA TAGCTATCACGACCGC TAGCTGACCGC TCACGACA S 1 S 4 S 1 S 2 S 3 S 4 S 2 S 3 = = -AGGCTATCACCTGACCTCCA TAG-CTATCAC--GACCGC-TAG-CT-------GACCGC----TCAC--GACCGACA
The Tree of Life: Multiple Challenges Scientific challenges: • • • Ultra-large multiple-sequence alignment Gene tree estimation Metagenomic classification Alignment-free phylogeny estimation Supertree estimation Estimating species trees from many gene trees Genome rearrangement phylogeny Reticulate evolution Visualization of large trees and alignments Data mining techniques to explore multiple optima Theoretical guarantees under Markov models of evolution Techniques: applied probability theory, graph theory, supercomputing, and heuristics Testing: simulations and real data http: //tandy. cs. illinois. edu/CS 581 -firstday. pptx
http: //csiflabs. cs. ucdavis. edu/~gusfield/osb 08. ppt Combinatorial Optimization in Computational Biology: three topics that use Perfect Phylogeny Dan Gusfield OSB 2008, Lijiang, China, November 1, 2008
http: //csiflabs. cs. ucdavis. edu/~gusfield/osb 08. ppt Recombination: A richer model than Perfect Phylogeny M 12345 00000 10100 10000 01011 01010 00010 10101 added 1 4 3 10100 Pair 4, 5 fails the four gamete-test. The sites 4, 5 are incompatible. Real sequence histories often involve recombination. 10000 2 00010 5 0101101010
Sequence Recombination 01011 10100 S P 5 Single crossover recombination 10101 A recombination of P and S at recombination point 5. The first 4 sites come from P (Prefix) and the sites from 5 onward come from S (Suffix). http: //csiflabs. cs. ucdavis. edu/~gusfield/osb 08. ppt
http: //csiflabs. cs. ucdavis. edu/~gusfield/osb 08. ppt Network with Recombination: ARG M 12345 00000 10100 10000 01011 01010 00010 10101 new 1 4 3 2 10100 P The previous tree with one recombination event now derives all the sequences. 10000 5 10101 S 00010 5 0101101010
http: //www. cri. haifa. ac. il/people/irith/lectures/Introduction%20 to%20 Graph-Theoryv 2. pptx Chemistry Atom – vertex Bond – edge E. g. C 3 H 7 OH Enumerating all H isomers of a chemical compound. • Determining if two compounds with the same formula are identical. • • H H H C C C H H H O H 26
http: //www. cri. haifa. ac. il/people/irith/lectures/Introduction%20 to%20 Graph-Theoryv 2. pptx Chemistry H N O H H C C C H O H H N N H H C C C H N N H H O C 3 H 7 N 2 O 2 H H O Graph isomorphism problem 27
http: //www. cri. haifa. ac. il/people/irith/lectures/Introduction%20 to%20 Graph-Theoryv 2. pptx Chemistry • Enumeration of isomers (graph enumeration) • Deciding whether two compounds are identical or not (graph isomorphism problem) 28
- Berkeley
- Pptx
- Pptx
- Pptx
- Pptx
- Sign up germanistik uni heidelberg
- Gedig pa deur marlise joubert
- Rotor frequency formula
- Msix + app attach pptx
- Pptx
- Tandy trower
- Tandy warnow
- Tandy warnow
- Tandy warnow
- Tandy warnow
- Fineboosting
- Tandy warnow
- Tandy warnow
- Tandy trower
- Tandy warnow
- Cpsc 581
- Art 581 cpp
- Cis581
- 581 ad
- Cis 581 upenn
- Da form 444
- Cpsc 581
- Cpsc 581
- Supercom 581