Comp 790 087 Genetics Evolution and the Coalescent

  • Slides: 22
Download presentation
Comp 790 -087 Genetics, Evolution, and the Coalescent Theory • Administrative Details • Course

Comp 790 -087 Genetics, Evolution, and the Coalescent Theory • Administrative Details • Course Overview • Coalescent Theory? 2/23/2021 Comp 790– Introduction & Coalescence 1

Course Overview • Synopsis – Graduate-level project course – Guided reading, discussions, and project

Course Overview • Synopsis – Graduate-level project course – Guided reading, discussions, and project with write-up • Website – To appear at: http: //www. unc. edu/courses/2009 spring/comp/790/087/ • Course Grading – Class Participation 10% – 2 In-class Presentations 40% – Final Project, Presentation, & Write-up 50% 2/23/2021 Comp 790– Introduction & Coalescence 2

Syllabus • ⅓ Guided reading/discussion of text • ⅓ Student presentations of recent papers

Syllabus • ⅓ Guided reading/discussion of text • ⅓ Student presentations of recent papers • ⅓ Project Proposals 2/23/2021 Comp 790– Introduction & Coalescence 3

Coalescent Theory • Ancestral properties can be inferred from extant populations • Alternatives to

Coalescent Theory • Ancestral properties can be inferred from extant populations • Alternatives to Correctness – Most-Likely – Most Parsimonious – Other optimality criteria • Background – Biology (genetics) – Statistics – Computational Modeling 2/23/2021 Comp 790– Introduction & Coalescence 4

Non-Classical Genetics • Coalescence differs from classical genetics – Analysis rather than synthesis –

Non-Classical Genetics • Coalescence differs from classical genetics – Analysis rather than synthesis – Depends on models, which attempt to explain observations – Less emphasis on Darwin’s natural selection • Considers population dynamics – Isolation – Bottlenecks 2/23/2021 Comp 790– Introduction & Coalescence 5

Historical Human Migrations 2/23/2021 Comp 790– Introduction & Coalescence 6

Historical Human Migrations 2/23/2021 Comp 790– Introduction & Coalescence 6

Population Dynamics • It is helpful to view evolutionary trees in the contexts of

Population Dynamics • It is helpful to view evolutionary trees in the contexts of geography and population structure • These factors affect the prevalence and distribution of genes • Genetic diversity largely depends on population isolation and population bottlenecks, as well as – Constant population size (resource limited) – Sudden increases in population (explosions) – Patterns of growth (exponential, uniform, etc. ) 2/23/2021 Comp 665 – Introduction & Signals 7

It’s About Genes • Genetics is most clearly understood by considering its subject to

It’s About Genes • Genetics is most clearly understood by considering its subject to be genes rather than organisms • Organisms are merely vessels for assuring the survival of genes • Successful genes live on long after their host organism • An objective of a gene is to replicate itself “[Genes] that survived were the ones that built survival machines for themselves to live in. But making a living got steadily harder as new rivals arose with better and more effective survivial machines. Survival machines got bigger and more eloborate, and the process was cumulative and progressive…” -- Dawkins, The Selfish Gene 2/23/2021 Comp 790– Introduction & Coalescence 8

Why Computer Science • Classically, genetics, both generative (classical) and coalescent (population) has focused

Why Computer Science • Classically, genetics, both generative (classical) and coalescent (population) has focused on mathematical/statistical models • As model complexity increases, it becomes harder to find closed-form solutions • Relies more and more on computational modeling to ascertain structure • Also, complicated models often lead to common models… today’s subject 2/23/2021 Comp 790– Introduction & Coalescence 9

Wright-Fisher Model • One of the first, and simplest models of population genealogies was

Wright-Fisher Model • One of the first, and simplest models of population genealogies was introduced by Wright (1931) and Fisher (1930). • Model emphasizes transmission of genes from one generation to the next • For simplicity we’ll first focus on a fixed population size, each with a distinct gene variant 2/23/2021 Comp 790– Introduction & Coalescence 10

Simple Haploid Model • Rules G 0: ['A', 'B', 'C', 'D', 'E', 'F', 'G',

Simple Haploid Model • Rules G 0: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] – Antecedent genes are chosen randomly, with replacement, from their parental generation – No selection – Fixed population size 2/23/2021 G 1: ['J', 'A', 'H', 'B', 'I', 'E', 'D', 'G', 'A', 'B'] G 2: ['A', 'J', 'E', 'G', 'D', 'E', 'B', 'I', 'A'] G 3: ['A', 'E', 'J', 'I', 'A', 'J', 'B'] G 4: ['E', 'A', 'B', 'A', 'E', 'A', 'A'] G 5: ['A', 'B', 'A', 'E', 'A', 'B'] G 6: ['A', 'A', 'B', 'A'] G 7: ['B', 'A', 'A', 'A'] What will this population eventually look like? Comp 790– Introduction & Coalescence 11

Assumptions of Wright/Fisher • • • Discrete and non-overlapping generations Haploid individuals Populations size

Assumptions of Wright/Fisher • • • Discrete and non-overlapping generations Haploid individuals Populations size is constant All individuals are equally fit No population of social structure Genes segregate independently 2/23/2021 Comp 790– Introduction & Coalescence 12

Some Graphical Abstractions • Replace letters with colors • Draw lineages • Sort topologically

Some Graphical Abstractions • Replace letters with colors • Draw lineages • Sort topologically 2/23/2021 Comp 790– Introduction & Coalescence 13

Repeats Every population results in just one gene 2/23/2021 Comp 790– Introduction & Coalescence

Repeats Every population results in just one gene 2/23/2021 Comp 790– Introduction & Coalescence 14

Onset of Uniformity • 10000 trials • Mode = 11 (616) • Mean =

Onset of Uniformity • 10000 trials • Mode = 11 (616) • Mean = 17. 5 2/23/2021 Comp 790– Introduction & Coalescence 15

Diploid Model • Our model is obviously too simple, let’s add more realism •

Diploid Model • Our model is obviously too simple, let’s add more realism • Organisms are diploid (have 2, perhaps different, copies of each gene) • Sexual reproduction • Half female, Half Male 2/23/2021 Females Males ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ'] ['CF', 'AG', 'BI', 'CH', 'EG', 'EH', 'EI', 'DG'] ['CE', 'IE', 'CG', 'HE', 'FG', 'IG', 'BE', 'BG', 'HE'] ['CB', 'EI', 'HE', 'HI', 'HE', 'CG', 'EF', 'EI', 'HF', 'GI'] ['EE', 'HF', 'BC', 'IG', 'BH', 'HI', 'BI', 'EF', 'HG', 'HC'] ['CH', 'BH', 'EI', 'BH', 'BE', 'BI', 'HC', 'EH', 'IH'] ['BE', 'IB', 'BE', 'BH', 'HH', 'CB', 'EH'] ['BH', 'EC', 'BB', 'IC', 'BH', 'EH', 'BE', 'EB'] ['CE', 'BE', 'CH', 'BE', 'CB', 'HE', 'IB', 'CB'] ['EH', 'BB', 'BH', 'EC', 'EE', 'EC', 'CE', 'CI', 'CE'] ['CI', 'BE', 'EC', 'BE', 'BC', 'BE', 'BC'] ['EB', 'BB', 'CB', 'EB', 'CC', 'IB', 'BC', 'IE', 'CB'] ['BB', 'EC', 'BE', 'BC', 'CC', 'EI', 'BC', 'BC'] ['EB', 'CC', 'BB', 'CC', 'BI', 'CC', 'BC', 'CC'] ['BC', 'CC', 'BC', 'EC', 'BB', 'BC', 'BB'] ['BB', 'BC', 'CB', 'CC', 'CB', 'CE', 'CB'] ['CC', 'CB', 'BB', 'CC', 'BC'] ['CC', 'BB', 'CC', 'BC', 'CC', 'CB'] ['CC', 'BC', 'CC', 'CB', 'CC'] ['CC', 'BC', 'CC', 'CC', 'BC', 'CC'] ['CC', 'CC', 'BC', 'CB'] ['CC', 'BC', 'CC', 'CC', 'CC'] ['CC', 'CC', 'BC', 'CC', 'CC'] ['CC', 'CC', 'CC', 'CC'] Comp 790– Introduction & Coalescence 16

Same Result Females ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ'] ['AG',

Same Result Females ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ'] ['AG', 'BF', 'AI', 'EJ', 'DH', 'AH', 'CH', 'AI', 'DI', 'AH'] ['BI', 'AI', 'FA', 'AC', 'GH', 'AH', 'BI', 'DH', 'FI', 'AH'] ['HH', 'AI', 'BH', 'II', 'AI', 'GI', 'AD', 'IA', 'AA'] ['HG', 'ID', 'IA', 'ID', 'HG', 'HI', 'AA', 'AI', 'IA'] ['IA', 'DG', 'DH', 'HA', 'GA', 'DI', 'GH', 'GG', 'II', 'IH'] ['HI', 'GH', 'DH', 'GG', 'HH', 'II', 'HI', 'ID', 'GH'] ['HH', 'GI', 'HI', 'HH', 'IH', 'HI', 'GG', 'HH'] ['GH', 'HH', 'II', 'HH', 'HI', 'HG', 'HH'] ['HI', 'HG', 'HH', 'IH', 'HH', 'GH', 'HI', 'HH'] ['HH', 'HI', 'HG', 'HH', 'IH'] ['HH', 'HH', 'IH', 'GH', 'HH', 'GH'] ['HI', 'HH', 'HG', 'HH', 'HI', 'HG', 'HH'] ['GG', 'GH', 'HI', 'HH', 'IG', 'GH', 'IG', 'HH', 'HH'] ['IH', 'HG', 'HI', 'GH', 'HH', 'GI', 'IG', 'HH', 'GH'] ['IH', 'HG', 'II', 'HH', 'HG', 'IH'] ['II', 'HI', 'HH', 'IH', 'HH', 'GI', 'HH'] ['IH', 'II', 'HH', 'IH', 'HH'] ['HH', 'II', 'IH', 'HH', 'IH'] ['IH', 'HH', 'II', 'IH'] ['HI', 'IH', 'HI', 'IH', 'HI', 'HH'] ['IH', 'HI', 'HH', 'II', 'IH', 'HH'] ['II', 'HH', 'HH', 'HI', 'HH', 'II'] ['IH', 'HI', 'HH', 'HH', 'IH', 'HI'] ['IH', 'HI', 'IH', 'HH', 'HI', 'HH'] ['HI', 'HH', 'HH', 'HI', 'HH'] ['HH', 'IH', 'HI', 'HH', 'HH'] ['HH', 'IH', 'HH', 'HH'] ['HH', 'HH', 'HH', 'HH'] 2/23/2021 Males Females ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ'] ['EF', 'EI', 'DI', 'BF', 'EJ', 'BI', 'AI', 'CI', 'EJ', 'EI'] ['FB', 'EE', 'EI', 'JB', 'FE', 'BJ', 'JE', 'EI', 'EE', 'DI'] ['FJ', 'IE', 'FE', 'JE', 'EJ', 'FI', 'FE', 'BJ', 'EB', 'FJ'] ['FF', 'II', 'EB', 'FI', 'EJ', 'EB', 'IB', 'EI', 'EF'] ['FE', 'IE', 'BB', 'BE', 'EB', 'IE', 'BI', 'FJ'] ['BB', 'FE', 'EE', 'BJ', 'II', 'EE', 'BE', 'FE'] ['BB', 'EB', 'EE', 'EI', 'EE', 'FI', 'EE', 'EE'] ['EF', 'BE', 'EE', 'BE', 'IE', 'BI', 'EE', 'EE'] ['EE', 'BE', 'FE', 'EB', 'EE', 'BE', 'FE', 'EE'] ['EF', 'EE', 'EE', 'FE', 'EB', 'BE', 'EE'] ['EE', 'FB', 'EE', 'EE', 'EE'] ['EE', 'EE', 'BE', 'EE'] ['EE', 'EE', 'EE', 'EE'] Females Males ['AA', 'BB', 'CC', 'DD', 'EE', 'FF', 'GG', 'HH', 'II', 'JJ'] ['AF', 'AJ', 'AF', 'CG', 'BH', 'AH', 'EG', 'DJ'] ['AA', 'AH', 'CA', 'AJ', 'BJ', 'GB', 'AB', 'GH', 'BB', 'AA'] ['JH', 'AB', 'AA', 'AG', 'AB', 'JB', 'AH', 'AA', 'AG'] ['GA', 'AG', 'BH', 'AB', 'BA', 'GA', 'AA', 'AA'] ['AA', 'AG', 'AA', 'AB', 'AA'] ['AA', 'AB', 'AA', 'AA', 'AA'] ['BA', 'AA', 'BA', 'AA', 'AA'] ['AB', 'AA', 'AA', 'AA'] ['AA', 'BA', 'AA', 'AA', 'AA'] ['AA', 'BA', 'AA', 'AA'] ['AA', 'AA', 'BA', 'AA'] ['AA', 'AA', 'AB', 'AA'] ['AA', 'AA', 'AA', 'AA'] Comp 790– Introduction & Coalescence 17

Similar Distributions • 10000 trials • Mode = 24 (291) • Mean = 35.

Similar Distributions • 10000 trials • Mode = 24 (291) • Mean = 35. 5 • These statistics are almost exactly 2 x the haploid (Mode = 11, Mean = 17. 5) 2/23/2021 Comp 790– Introduction & Coalescence 18

Punnett Squares AA x BB A A AB x AB A B B AB

Punnett Squares AA x BB A A AB x AB A B B AB AB A AA AB BB AB x BB A B B AB BB What if we introduce inbreeding, by choosing mates from common litters, in successive generations 2/23/2021 Comp 790– Introduction & Coalescence 19

Distributions 2/23/2021 Comp 790– Introduction & Coalescence 21

Distributions 2/23/2021 Comp 790– Introduction & Coalescence 21

Next time • Commonly occurring distributions – Geometric – Exponential 2/23/2021 Comp 790– Introduction

Next time • Commonly occurring distributions – Geometric – Exponential 2/23/2021 Comp 790– Introduction & Coalescence 22