Introduction to computational evolutionary genomics Yong E Zhang

  • Slides: 97
Download presentation
Introduction to computational evolutionary genomics Yong E. Zhang Institute of Zoology, CAS 2013/4/24 http:

Introduction to computational evolutionary genomics Yong E. Zhang Institute of Zoology, CAS 2013/4/24 http: //zhanglab. ioz. ac. cn

Outline 1. Concept and topic 2. Case studies 3. New trends

Outline 1. Concept and topic 2. Case studies 3. New trends

1. Concept and topic 1. 1 Speciation and tree 1. 2 Ortholog and paralog

1. Concept and topic 1. 1 Speciation and tree 1. 2 Ortholog and paralog 1. 3 Mutation 1. 4 Polymorphism and divergence 1. 5 Selection

1. 1 Speciation and tree Darwin, C. (1837)

1. 1 Speciation and tree Darwin, C. (1837)

From tree of life to web of life Adapted from Eugene V. Koonin (2009)

From tree of life to web of life Adapted from Eugene V. Koonin (2009) Nucleic Acids Res.

Is it time to redefine evolutionary biology?

Is it time to redefine evolutionary biology?

1. 2 Ortholog and paralog Gene duplication a Hemoglobin b Hemoglobin Speciation Mouse a

1. 2 Ortholog and paralog Gene duplication a Hemoglobin b Hemoglobin Speciation Mouse a Hb Rat a Hb Paralogs Mouse b Hb Rat b Hb Orthologs By David Pollock

Detection of orthologs We can perform BLAST all-against-all search and pull out one-to-one best

Detection of orthologs We can perform BLAST all-against-all search and pull out one-to-one best hits. A more convinient way is to download Ensembl pre-computed annotation. http: //www. ensembl. org

Detection of orthologs (continued) http: //genome. ucsc. edu

Detection of orthologs (continued) http: //genome. ucsc. edu

Orthologs may not be functionally more similar between each other “It is widely assumed

Orthologs may not be functionally more similar between each other “It is widely assumed that orthologs share similar functions, whereas paralogs are expected to diverge more from each other. But does this assumption hold up on further examination? We present evidence that orthologs and paralogs are not so different in either their evolutionary rates or their mechanisms of divergence. “ Studer, R. et al. (2009) Trends in Genet.

1. 3 Mutation: single nucleotide polymorphism (SNP) From Wikipedia

1. 3 Mutation: single nucleotide polymorphism (SNP) From Wikipedia

Types of SNP Purines: Transitions A G Transversions Pyrimidines: C T David Pollock (2011)

Types of SNP Purines: Transitions A G Transversions Pyrimidines: C T David Pollock (2011)

SNP in coding regions Cys Arg Lys UGU/AGA/AAG Silent Nonsense Missense UGU/CGA/AAG Cys Arg

SNP in coding regions Cys Arg Lys UGU/AGA/AAG Silent Nonsense Missense UGU/CGA/AAG Cys Arg Lys UGU/GGA/AAG Cys Gly Lys Cys STOP Lys First position: 4% of all changes silent Second position: no changes silent Third position: 70% of all changes silent (wobble position) David Pollock (2011)

Indels …TGTACAAAG… Insertion Deletion …TGTAAAAG… …TGTTACAAAG… Adapted from David Pollock (2011)

Indels …TGTACAAAG… Insertion Deletion …TGTAAAAG… …TGTTACAAAG… Adapted from David Pollock (2011)

Indels may increase the local substitution rate Tian, D. et al (2008) Nature

Indels may increase the local substitution rate Tian, D. et al (2008) Nature

Structural variation Sharp, A. , Cheng, Z. & Eichler, E. E. (2006) Annu. Rev.

Structural variation Sharp, A. , Cheng, Z. & Eichler, E. E. (2006) Annu. Rev. Genomics Hum. Genet.

1. 4 Polymorphism and divergence Graur, D. & Li, WH (2002) Fundamentals of molecular

1. 4 Polymorphism and divergence Graur, D. & Li, WH (2002) Fundamentals of molecular evolution

Polymorphism and divergence (continued) Innan, H & Kondrashov, F (2010) Nature Rev. Genet.

Polymorphism and divergence (continued) Innan, H & Kondrashov, F (2010) Nature Rev. Genet.

Population genetics and molecular evolution is intrinsically interconnected. Conventionally, the key question for evolutionary

Population genetics and molecular evolution is intrinsically interconnected. Conventionally, the key question for evolutionary biologists is to infer the evolutionary history of DNAs and the underlying evolutionary forces.

1. 5 Positive or adaptive Selection From Wikipedia

1. 5 Positive or adaptive Selection From Wikipedia

Negative or purifying selection In natural selection, negative selection or purifying selection is the

Negative or purifying selection In natural selection, negative selection or purifying selection is the selective removal of alleles that are deleterious. From Wikipedia

Ka/Ks The Ka/Ks ratio (or ω, d. N/d. S), is the ratio of the

Ka/Ks The Ka/Ks ratio (or ω, d. N/d. S), is the ratio of the number of non-synonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks). Ka/Ks > 1, positive selection Ka/Ks = 1, neutral evolution Ka/Ks < 1, negative selection http: //abacus. gene. ucl. ac. uk/software/paml. html

Mc. Donald–Kreitman Test If the ratio of fixed differences to polymorphisms is much higher

Mc. Donald–Kreitman Test If the ratio of fixed differences to polymorphisms is much higher for nonsynonymous changes (i. e. Dn/Pn >> Ds/Ps), this indicates that genetic changes have been subject to positive selection. Adapted from Sella. G, et al. (2009) PLo. S Genet.

Outline 1. Concept and topic 2. Case studies 3. New trends

Outline 1. Concept and topic 2. Case studies 3. New trends

2. Case studies 2. 1 Neutralist vs. selectionist 2. 2 Evolution of X chromosome

2. Case studies 2. 1 Neutralist vs. selectionist 2. 2 Evolution of X chromosome in terms of male-biased genes 2. 3 A journey of decades: pinpoint the genetic basis of human brain evolution

Case 1: Neutralist vs. selectionist

Case 1: Neutralist vs. selectionist

It is widely accepted that negative selection dominates evolution of proteins Nei, M. (2010)

It is widely accepted that negative selection dominates evolution of proteins Nei, M. (2010) Annu. Rev. Genomics Hum. Genet.

What is the underlying force governing sequence changes?

What is the underlying force governing sequence changes?

Neutral evolution “Calculating the rate of evolution in terms of nucleotide substitutions seems to

Neutral evolution “Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations involved must be neutral ones. ” Kumura, M. (1968) Nature We can use neutral model as a null model to infer selection.

Stories of positive selection are popular 1. 2. 3. 4. Bustamante, C. D. et

Stories of positive selection are popular 1. 2. 3. 4. Bustamante, C. D. et al. (2005) Nature Sabeti, P. C. , et al. (2006) Science Grossman, S. R. , et al. (2010) Science. . . 5. 6. 7. 8. Zhang, Y. E. , et al. (2009) BMC Evol. Biol. Fan, C. et al. (2008) Mol. Biol. Evol. Liopart, A. et al. (2002) PNAS …

Pervasive selection “The findings further indicate that, in Drosophila, adaptations may be both common

Pervasive selection “The findings further indicate that, in Drosophila, adaptations may be both common and strong enough that the fate of neutral mutations depends on their chance linkage to adaptive mutations as much as on the vagaries of genetic drift. ” Sella. G, et al. (2009) PLo. S Genet.

Non-adaptive force drives evolution of genomes Nothing in evolution makes sense except in the

Non-adaptive force drives evolution of genomes Nothing in evolution makes sense except in the light of population genetics.

Fixation probability and effective population size (Ne) For a neutral allele, P=1/2 Ne For

Fixation probability and effective population size (Ne) For a neutral allele, P=1/2 Ne For a positively selected allele, P=2 s Thus, slightly deleterious allele tends to be fixed in species with a small Ne while slightly beneficial allele are more often fixed in species with a big Ne. In other words, Ne largely affects the efficacy of selection.

Birth and death of duplicate genes Lynch, M. et al. (2000) Science

Birth and death of duplicate genes Lynch, M. et al. (2000) Science

Neu matters Lynch, M. et al. (2003) Science

Neu matters Lynch, M. et al. (2003) Science

Fixation of slightly deleterious introns Li, W. et al. (2009) Science

Fixation of slightly deleterious introns Li, W. et al. (2009) Science

Case 2: evolution of X chromosome in terms of male-biased genes

Case 2: evolution of X chromosome in terms of male-biased genes

Sexual dimorphism and sex-biased genes Sexual dimorphism Male-biased expression Vicoso, B. & Charlesworth, B.

Sexual dimorphism and sex-biased genes Sexual dimorphism Male-biased expression Vicoso, B. & Charlesworth, B. (2006) Nature Rev. Genet. Ellegren, H. & Parsch, J. (2007) Nature Rev. Genet. Ellegren, H. (2011) Nature Rev. Genet.

Proportion of male-biased genes Which chromosome more often encodes male-biased genes? Male-biased genes X

Proportion of male-biased genes Which chromosome more often encodes male-biased genes? Male-biased genes X A Parisi, M, et al. (2003) Science Male-biased genes X A Wang, P. J, et al. (2001) Nature Genet. Mueller, J. L. et al. (2008) Nature Genet.

Out-of-X retrogene traffic Betran, E. , et al. (2002) Genome Res. Emerson, J. J.

Out-of-X retrogene traffic Betran, E. , et al. (2002) Genome Res. Emerson, J. J. et al. (2004) Science

Out-of-X DNA-level traffic Vibranovski, M. D. , Zhang, Y. E. , et al. (2009)

Out-of-X DNA-level traffic Vibranovski, M. D. , Zhang, Y. E. , et al. (2009) Genome Res.

A few cases of young X-linked male-biased genes have been identified Nurminsky, D. I.

A few cases of young X-linked male-biased genes have been identified Nurminsky, D. I. , et al. (1998) Nature

Young male-biased genes appears often encoded by X chromosome. Does it indicate an age-associated

Young male-biased genes appears often encoded by X chromosome. Does it indicate an age-associated pattern?

Inference of gene ages based on syntenic genomic alignment Zhang, Y. E. , Vibranovski,

Inference of gene ages based on syntenic genomic alignment Zhang, Y. E. , Vibranovski, M. D. , Krinsky, B. H. & Long, M. (2010) Genome Res.

Genome-wide age dating in D. melanogaster

Genome-wide age dating in D. melanogaster

Young and old genes have different patterns Old genes (Branch 0) Young genes (Branch

Young and old genes have different patterns Old genes (Branch 0) Young genes (Branch 5/6) Conventional wisdom Hidden fact

mi. RNA is similar

mi. RNA is similar

Genome-wide age dating in human and mouse Human Mouse Zhang, Y. E. , Vibranovski,

Genome-wide age dating in human and mouse Human Mouse Zhang, Y. E. , Vibranovski, M. D. , Landback, P. , Marais, G. A. B. & Long, M. (2010) PLo. S Bio.

Similar to fruitfly, male-biased genes follow an age-dependent distribution Human Mouse

Similar to fruitfly, male-biased genes follow an age-dependent distribution Human Mouse

What is the reason for such an age-dependent pattern?

What is the reason for such an age-dependent pattern?

Forces acting on X chromosome: meiotic sex chromosome inactivation (MSCI)

Forces acting on X chromosome: meiotic sex chromosome inactivation (MSCI)

Forces acting on X chromosome (continued): faster-X effect h: degree of dominance; s: effect

Forces acting on X chromosome (continued): faster-X effect h: degree of dominance; s: effect of mutation; R: ratio between evolution rate of X and autosome Vicoso, B & Charlesworth, B (2006) Nature Rev. Genet.

Proportion of expressed genes MSCI differentially acts on old and young genes Old genes

Proportion of expressed genes MSCI differentially acts on old and young genes Old genes Young genes

Conclusion: X-linked male-biased genes actually follow a two-stage evolution Faster-X effect X Young MSCI

Conclusion: X-linked male-biased genes actually follow a two-stage evolution Faster-X effect X Young MSCI Autosome Old

Broad impact: the picture has been updated Ellegren, H. (2011) Nature Rev. Genet. Zhang,

Broad impact: the picture has been updated Ellegren, H. (2011) Nature Rev. Genet. Zhang, Y. E. , Vibranovski, M. D. , Landback, P. , Marais, G. A. B. & Long, M. (2010) PLo. S Bio.

Case 3: a journey of decades: pinpoint the genetic basis of human brain evolution

Case 3: a journey of decades: pinpoint the genetic basis of human brain evolution

Complexity of human brain Johnson, M. et al. (2009) Neuron

Complexity of human brain Johnson, M. et al. (2009) Neuron

Expansion of human brain Rakic, P. et al. (2009) Nature Rev. Neurosci.

Expansion of human brain Rakic, P. et al. (2009) Nature Rev. Neurosci.

Regulatory changes drives brain expansion “Their macromolecules are so alike that regulatory mutations may

Regulatory changes drives brain expansion “Their macromolecules are so alike that regulatory mutations may account for their biological differences. ” King, M. et al. (1975) Science “Promoter regions of many neural- and nutrition-related genes have experienced positive selection during human evolution. ” Haygood, R. , et al. (2007) Nature Genet. Torgerson, D. , et al. (2009) PLo. S Genet.

Accelerated evolution of brain genes in human Dorus, S. , et al. (2004) Cell

Accelerated evolution of brain genes in human Dorus, S. , et al. (2004) Cell

Brain genes are generally constrained “We suggest that such abundant and complex transcription may

Brain genes are generally constrained “We suggest that such abundant and complex transcription may increase gene–gene interactions and constrain CDS evolution. ” Wang, H. , et al. (2007) PLo. S Bio.

Is ASPM under positive selection with respect to the coding region? Mekel-Bobrov, N. ,

Is ASPM under positive selection with respect to the coding region? Mekel-Bobrov, N. , et al. (2005) Science

ASPM does not differ from the genomic background Yu, F. , et al. (2007)

ASPM does not differ from the genomic background Yu, F. , et al. (2007) Science http: //www. ub. edu/dnasp

Protein-level evolution appears generally irrelevant with brain evolution. Is it true? How about new

Protein-level evolution appears generally irrelevant with brain evolution. Is it true? How about new gene origination?

Brain preferentially recruited new genes in human lineage relative to other organ/tissues Zhang, Y.

Brain preferentially recruited new genes in human lineage relative to other organ/tissues Zhang, Y. E. , Landback, P. , Vibranovski, M. D. & Long, M. (2011) PLo. S Biology

This excess is mainly contributed by fetal brain

This excess is mainly contributed by fetal brain

Developing neocortex contributes to most excess

Developing neocortex contributes to most excess

New genes upregulated in developing neocortex is highly enriched with primate-specific transcription factors

New genes upregulated in developing neocortex is highly enriched with primate-specific transcription factors

New gene origination parallels the morphological evolution of brain

New gene origination parallels the morphological evolution of brain

Conclusion: the uniqueness of human brain is at least partially contributed by new gene

Conclusion: the uniqueness of human brain is at least partially contributed by new gene origination

Broad impact

Broad impact

Broad impact (continued)

Broad impact (continued)

Understand macro-evolution with phylostratigraphy Wray, G. A. (2011) Science

Understand macro-evolution with phylostratigraphy Wray, G. A. (2011) Science

Outline 1. Concept and topic 2. Case studies 3. New trends

Outline 1. Concept and topic 2. Case studies 3. New trends

“We have learned nothing from the genome” Venter, C. HGP is dead. Long live

“We have learned nothing from the genome” Venter, C. HGP is dead. Long live HGP!

Our journey is to the ocean of stars.

Our journey is to the ocean of stars.

Revolution of sequencing technique Shendure, J. & Ji, H. (2008) Nature Biotech.

Revolution of sequencing technique Shendure, J. & Ji, H. (2008) Nature Biotech.

Rapidly decreased cost enabled by next generation sequencing (NGS) techniques

Rapidly decreased cost enabled by next generation sequencing (NGS) techniques

Explosion of Genbank data

Explosion of Genbank data

Golden age “It is an exciting time to be studying the evolutionary forces that

Golden age “It is an exciting time to be studying the evolutionary forces that shape genomic variation in natural populations. After decades as a theory-rich and data-poor discipline, rapidly advancing genomic technology is turning the intellectual dynamic in population genetics on end. ” By Langley, C. “Within years, tens of thousands of complete genome sequences will be available from humans and from extinct hominids, as well as from thousands of other species. Given the human mutation rate, we will soon know of variation among individuals at almost all sites in the genome. For population genetics, this ushers in a previously unimaginable opportunity to reconstruct the entire genealogical and mutational history of humans and pushes us against the limits of what we will be able to infer about the evolutionary and genetic forces that affected every region of the genome. “ Przeworski, M. (2011) Science

Things previously impossible become possible 1. Deep population survey 2. In search of loci

Things previously impossible become possible 1. Deep population survey 2. In search of loci under recent adaptation 3. Analysis of fundamental questions Micro Evolution 4. Cancer evolution 5. Paleogenomics 6. Non-model organism (meta genome) 7. Expressional evolution Macro Evolution

1. Deep population survey

1. Deep population survey

Arabidopsis 1001 project

Arabidopsis 1001 project

Drosophila African survey

Drosophila African survey

2. In search of loci under positive selection Yi, X. , et al. (2010)

2. In search of loci under positive selection Yi, X. , et al. (2010) Science

3. Standing variation and de novo mutation

3. Standing variation and de novo mutation

Genome-wide quantification of antagonistic pleiotropy in yeast “Antagonistic pleiotropy (AP) refers to the phenomenon

Genome-wide quantification of antagonistic pleiotropy in yeast “Antagonistic pleiotropy (AP) refers to the phenomenon that, compared to the wild-type allele, a mutation is beneficial to some traits or in some environments but deleterious to other traits or in other environments. AP was proposed by George Williams more than 50 years ago and has important implications in many areas of biology such as aging, cooperation, and evolution. However, it is unknown how common AP is, because it has never been systematically examined, especially at the genomic scale. We here quantify the prevalence of AP in the budding yeast Saccharomyces cerevisiae by measuring the fitness effects of null mutations of ~5000 nonessential genes in multiple environments, using the high-throughput Illumina-sequencing-based bar-seq method, which is more accurate than the previously used microarray-based method. ” By Wenfeng Qian

Selection favors the out-of-X gene traffic Schrider, D. R. , et al. (2011) Genome

Selection favors the out-of-X gene traffic Schrider, D. R. , et al. (2011) Genome Res.

4. Search cancer-causing mutations by phylogenetic reconstruction Tao, Y et al. (2011) PNAS

4. Search cancer-causing mutations by phylogenetic reconstruction Tao, Y et al. (2011) PNAS

On timing of cancer progress “A quantitative analysis of the timing of the genetic

On timing of cancer progress “A quantitative analysis of the timing of the genetic evolution of pancreatic cancer was performed, indicating at least a decade between the occurrence of the initiating mutation and the birth of the parental, non-metastatic founder cell. ” Yachida, S. et al. (2010) Nature

5. Paleogenomics “We show that Neandertals shared more genetic variants with presentday humans in

5. Paleogenomics “We show that Neandertals shared more genetic variants with presentday humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non -Africans occurred before the divergence of Eurasian groups from each other. ”

6. Target non-model organisms Insects are "Little Creatures Who Run the World" Wilson, E.

6. Target non-model organisms Insects are "Little Creatures Who Run the World" Wilson, E. O. “Therefore, we, the undersigned, are pleased to announce the launch of the “i 5 k” initiative to sequence the genomes of 5000 species of insects and other arthropods during the next 5 years (8). This project is aimed at sequencing and analyzing the genomes of all species known to be important to worldwide agriculture and food safety, medicine, and energy production; all species used as models in biology; the most abundant insects in world ecosystems; and, to achieve a deep understanding of arthropod evolution, representatives of insect relatives in every major branch of arthropod phylogeny. “ Robinson, G. E. et al. (2011) Science

6. Target non-model organisms (continued) Coghlan, M. L. et al. (2012) PLo. S Genet.

6. Target non-model organisms (continued) Coghlan, M. L. et al. (2012) PLo. S Genet.

7. Transcriptome evolution Brawand, D. , et al. (2011) Nature

7. Transcriptome evolution Brawand, D. , et al. (2011) Nature

Expressional variation within species Pickrell, J. , et al. (2011) Nature

Expressional variation within species Pickrell, J. , et al. (2011) Nature

It is time to play,

It is time to play,

Many thanks. Any question is welcome! Email: Zhang. Lab. IOZ@gmail. com

Many thanks. Any question is welcome! Email: Zhang. Lab. IOZ@gmail. com