1 Population Genomics Gil Mc Vean Department of

  • Slides: 41
Download presentation
1 Population Genomics Gil Mc. Vean, Department of Statistics, Oxford

1 Population Genomics Gil Mc. Vean, Department of Statistics, Oxford

2 Questions about genetic variation • How different are our genomes? • How is

2 Questions about genetic variation • How different are our genomes? • How is the variation distributed within and between genomes? • What does variation tell us about human evolution?

3 How different are our genomes?

3 How different are our genomes?

4 Serological techniques for detecting variation Rabbit Human A A B AB O

4 Serological techniques for detecting variation Rabbit Human A A B AB O

5 Blood group systems in humans • 28 known systems – 39 genes, 643

5 Blood group systems in humans • 28 known systems – 39 genes, 643 alleles System Genes Alleles Knops CR 1 24+ ABO 102 ICAM 4 3 Colton C 4 A, C 4 B 7+ Landsteiner. Wiener Chido-rodgers AQP 1 7 Lewis FUT 3, FUT 6 14/20 Colton DAF 10 Lutheran LU 16 Diego SLC 4 A 1 78 MNS 43 Dombrock DO 9 GYPA, GYPB, G YPE Duffy FY 9 OK BSG 2 Gerbich GYPC 9 P-related A 4 GALT, B 3 GALT 3 14/5 GIL AQP 3 2 RAPH-MER 2 CD 151 3 H/h FUT 1, FUT 2 27/22 Rh 129 I GCNT 2 7 RHCE, RHD, RHAG Indian CD 44 2 Scianna ERMAP 4 Kell KEL, XK 33/30 Xg XG, CD 99 - Kidd SLC 14 A 1 8 YT ACHE 4 http: //www. bioc. aecom. yu. edu/bgmut/summary. htm

6 Protein electroporesis • Changes in mass/charge ratio resulting from amino acid substitutions in

6 Protein electroporesis • Changes in mass/charge ratio resulting from amino acid substitutions in proteins can be detected Starch or agar gel -- +-+ +- - + Direction of travel • In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different Lewontin and Hubby (1966) Harris(1966)

7 The rise of DNA sequence analysis • RFLPs – Cann et al 1987

7 The rise of DNA sequence analysis • RFLPs – Cann et al 1987 • Sequencing of small regions – Vigilant et al 1991 • Whole genome sequencing – Ingman et al 2000

8 The human genomes… • The draft human genome sequence was published in 2001

8 The human genomes… • The draft human genome sequence was published in 2001 – This is a mosaic from several individuals • Since then, several more genomes have been sequenced, at least partially – Shotgun sequencing variation discovery • Other methods have been developed to look for gross chromosomal differences Nimblegen array CGH

9 The International Hap. Map Project • Launched in 2002 with the goal of

9 The International Hap. Map Project • Launched in 2002 with the goal of characterisingle nucleotide variation between 540 human genomes from individuals of European, Nigerian, Chinese and Japanese ancestry • Not a sequencing project, rather it types known polymorphisms • Has currently assembled information on over 6 million SNPs (single nucleotide polymorphisms)

10 The 1000 Genomes Project

10 The 1000 Genomes Project

11 How do we differ? – Let me count the ways • Single nucleotide

11 How do we differ? – Let me count the ways • Single nucleotide polymorphisms – 1 every few hundred bp • Short indels (=insertion/deletion) – 1 every few kb • Microsatellite (STR) repeat number – 1 every few kb • TGCATTGCGTAGGC TGCATTCCGTAGGC TGCATT---TAGGC TGCATTCCGTAGGC TGCTCATCAGC TGCTCATCA------GC Minisatellites – 1 every few kb ≤ 100 bp • Repeated genes – r. RNA, histones 1 -5 kb • Large inversions, deletions – Y chromosome, Copy Number Variants (CNVs)

12 Y chromosome variation • Non-pathological rearrangements of the AZFc region on the Y

12 Y chromosome variation • Non-pathological rearrangements of the AZFc region on the Y chromosome Tyler-Smith and Mc. Vean (2003)

13 Mutation is the ultimate source of variation • New mutations occur in the

13 Mutation is the ultimate source of variation • New mutations occur in the germ-line • Point mutations at about 2 x 10 -8 per nucleotide per generation – You pass on about 60 new mutations to your children, of which perhaps 1 changes the protein sequence encoded by a gene • Microsatellite mutations can occur much faster – Up to 10 -4 per generation – Some, e. g. in Huntington’s disease, have important consequences • Minisatellites can mutate at rates of up to 10 -1 per generation – The uniqueness of these patterns gives rise to DNA fingerprinting • Most of the differences between genomes are the result of inheriting mutations from our ancestors

14 Mutations in our ancestors Our genealogical tree Inherited mutations Our genomes

14 Mutations in our ancestors Our genealogical tree Inherited mutations Our genomes

15 Different, but not that different • Humans are one of the least diverse

15 Different, but not that different • Humans are one of the least diverse organisms (excepting cheetahs) Species Diversity (percent) Humans 0. 08 - 0. 1 Chimpanzees 0. 12 - 0. 17 Drosophila simulans 2 E. coli 5 HIV 1 30 Photos from UN photo gallery www. un. org/av/photo

16 An aside on the genetics of race • It is sometimes claimed that

16 An aside on the genetics of race • It is sometimes claimed that there is a ‘genetic basis to race’ • What is true is that groups of individuals from different parts of the world tend to have similar genomes because they share recent ancestry Rosenberg et al (2002) • But there are very few ‘fixed’ genetic differences between populations (I can think of one example – the FY gene) • The differences between populations are in terms of the combinations of variants,

17 How is genetic variation distributed within and between genomes?

17 How is genetic variation distributed within and between genomes?

18 Diversity is not evenly distributed across the genome I • Autosomes, sex chromosomes

18 Diversity is not evenly distributed across the genome I • Autosomes, sex chromosomes and mt. DNA have systematically different levels of diversity Genome Average pairwise differences / kb Relative copy number (a) Autosomes 0. 5 – 0. 85 1 X chromosome 0. 47 3/4 Y chromosome 0. 15 1/4 mt. DNA 2. 8 1/4 TISMWG (2001) , Jobling et al (2004) • This reflects differences in the number of chromosomes and the mutation rate

19 Diversity is not evenly distributed across the genome II • There are fluctuations

19 Diversity is not evenly distributed across the genome II • There are fluctuations in the level of variation across the genome HLA Chromosome 6 TISMWG (2001)

20 Diversity is not evenly distributed across genes I • Purifying selection eliminates deleterious

20 Diversity is not evenly distributed across genes I • Purifying selection eliminates deleterious mutations and reduces diversity in regions of strong functional constraint Zhao et al (2003)

21 Diversity is not evenly distributed across genes II • Adaptive evolution ‘wipes out’

21 Diversity is not evenly distributed across genes II • Adaptive evolution ‘wipes out’ diversity nearby due to the hitch-hiking effects of a selective sweep – e. g. Duffy-null locus in sub-Saharn africa, protects against P. vivax FY*O mutation African Pop 1 Pop 2 European Hamblin and Di Rienzo (2000) Ancestral allele Derived allele Missing data

22 Diversity is not evenly distributed across genes III • Some genes are under

22 Diversity is not evenly distributed across genes III • Some genes are under balancing or diversifying selection, where diversity is actively selected for – MHC complex: heterozygote advantage and frequency-dependent selection driven by recognition of pathogens Horton et al (1998)

23 Diversity is not evenly distributed across populations I • African populations are more

23 Diversity is not evenly distributed across populations I • African populations are more diverse than non-African populations – More polymorphisms – Polymorphisms at less skewed frequencies Population Segregating sites per kb (n = 30) Diversity per kb Tajima D statistic Hausa (African) 4. 8 0. 11 -0. 33 Italian 3. 2 0. 10 1. 18 Chinese 3. 0 0. 07 1. 19 Frisse et al (2001) • Differences reflect bottlenecks associated with the colonisation from Africa c. 65 KYA

24 mt. DNA phylogeography Non-African Ingman et al (2000)

24 mt. DNA phylogeography Non-African Ingman et al (2000)

25 The colonisation process as inferred from mt. DNA variation

25 The colonisation process as inferred from mt. DNA variation

26 What does genetic variation tell us about human evolution? • Modern humans appear

26 What does genetic variation tell us about human evolution? • Modern humans appear in the fossil record about 200 K years ago • The mitochondrial Eve dates back to about 150 K years ago • The Y-chromosome Adam dates back to about 70 K years ago • For most of our genome, however, the common ancestor is about 500 K – 1 M years ago – This predates the origin of Homo sapiens considerably

27 Human – chimp split Autosomal MRCA Origin of H. sapiens

27 Human – chimp split Autosomal MRCA Origin of H. sapiens

28 Did early humans interbreed with Neanderthals? Neanderthals mt. DNA sequences say no… Ovchinnikov

28 Did early humans interbreed with Neanderthals? Neanderthals mt. DNA sequences say no… Ovchinnikov et al (2000)

29 But… • There is some evidence for this in the presence of unusual

29 But… • There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations Plagnol and Wall (2006)

30 Deeper trees in the human genome • There is growing evidence that some

30 Deeper trees in the human genome • There is growing evidence that some regions of our genome have truly ancient common ancestors • Dystrophin has an ancient haplotype found primarily outside Africa suggesting a colonisation of >160 KYA • There is an inversion found primarily in Europeans that is roughly 3 MY old Haplotype 1 Haplotype 2 Stefansson et al (2005)

31 What are the genetic differences that make us human?

31 What are the genetic differences that make us human?

32 Chromosomal changes • Human chromosome 2 is a fusion of two chromosomes in

32 Chromosomal changes • Human chromosome 2 is a fusion of two chromosomes in great apes • There are several inversion differences between the chromosomes Feuk et al (2005)

33 Gene loss • Loss of enzymes that make sialic acid – Sugar on

33 Gene loss • Loss of enzymes that make sialic acid – Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins • Myosin heavy chain – Associated with gracilization Wang et al (2006)

34 Gene evolution • FOXP 2 is a highly conserved gene (across the mammalia),

34 Gene evolution • FOXP 2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment • Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions • However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage Enard et al. (2002)

35 What are the genetic differences that make people and peoples different?

35 What are the genetic differences that make people and peoples different?

36 Detecting recent adaptive evolution • Let’s look closely at the dynamics of the

36 Detecting recent adaptive evolution • Let’s look closely at the dynamics of the fixation process for adaptive mutations • The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation • This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974) • Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events

37 Long haplotypes • A selective sweep at the Lactase gene in Europeans

37 Long haplotypes • A selective sweep at the Lactase gene in Europeans

38 Strong population differentiation • SLC 24 A 5 Lamason et al (Science 2005)

38 Strong population differentiation • SLC 24 A 5 Lamason et al (Science 2005)

39

39

40 Classes of selected genes Voight et al. (2005)

40 Classes of selected genes Voight et al. (2005)

41 Reading • Human genetic variation – Rosenberg et al. Genetic structure of human

41 Reading • Human genetic variation – Rosenberg et al. Genetic structure of human populations. Science 2002, 298: 2381 -2385. – Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 2006, 1251 -1260. – Mc. Vean et al. Perspectives on human genetic variation from the International Hap. Map Project. PLo. S Genetics 2005, 1: e 54. • The origin of modern humans – Reed & Tishkoff. African human diversity, origins and migrations. Curr Opin Genet Dev. 2006 16: 597 -605. – Jobling et al. Human evolutionary genetics: origins, peoples, and disease. Garland Science, 2004. – Harding & Mc. Vean. A structured ancestral population for the evolution of modern humans. Curr. Op. Genet. Dev. 2004, 14: 667 -674. • Natural selection – Lamason et al. SLC 24 A 5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 2005, 310: 1782 -1786. – Sabeti et al. Positive natural selection in the human lineage. Science 2006, 312: 1614 -1620. – Tishkoff et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007 39: 31 -40