Genome Evolution Amos Tanay 2012 Genome evolution Lecture

  • Slides: 24
Download presentation
Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012 Genome evolution Lecture 9: Mutations and variational inference

Genome Evolution. Amos Tanay 2012 Sources of mutations • Mistakes – Replication errors (point

Genome Evolution. Amos Tanay 2012 Sources of mutations • Mistakes – Replication errors (point mutations, tandem dups/deletions) – Recombination errors (mainly indels) • Endogenous DNA Damage – Spontaneous base damage: Deaminations, depurinations – Byproducts of metabolism: Oxygen radicals that damage DNA • Exogenous DNA Damage – UV – Chemicals All of these mechanisms cross talk with the surrounding sequence

Genome Evolution. Amos Tanay 2012 DNA polymerases • replicating DNA • A good polymerase

Genome Evolution. Amos Tanay 2012 DNA polymerases • replicating DNA • A good polymerase domain has a misincorporation rate of 10 -5 (1/100, 000) • Any misincorps are clipped off with 99% efficiency by the “proofreading” activity of the polymerase • Further mismatch repair that works in 99. 9% of the case bring the fidelity of the main Polymerases to 10 -10 • Some dedicated polymerases are not as accurate!

Genome Evolution. Amos Tanay 2012 Replication slippage • Processing a strand, disconnect and reconnect

Genome Evolution. Amos Tanay 2012 Replication slippage • Processing a strand, disconnect and reconnect at the wrong place CACACACACA CGACAGTTACAAA Recombination errors • A consequence of partial homology between different chromosomal loci • Can introduce translocations if the matching sequences are on different chromosomes • Can introduce inversion or deletion if the matching sequences are on the same chromosome • Can generate duplication or deletions if the matching sequences are in tandem

Genome Evolution. Amos Tanay 2012 Endogenous DNA damage: Deamination of Cytosines NH O 2

Genome Evolution. Amos Tanay 2012 Endogenous DNA damage: Deamination of Cytosines NH O 2 N O H H de. NHn H N H* N O H Cytosine *Thymine has CH 3 here N H Uracil H

Genome Evolution. Amos Tanay 2012 Deamination of Cytosine creates a G-U mismatch Easy to

Genome Evolution. Amos Tanay 2012 Deamination of Cytosine creates a G-U mismatch Easy to tell that U is wrong Deamination of Cytosine creates a G-T mismatch Not easy to tell which base is the mutation. About 50% of the time the G is “corrected” to A resulting in a mutation

Genome Evolution. Amos Tanay 2012 Exogenous DNA damage Chemicals • Food • Benzopyrene –

Genome Evolution. Amos Tanay 2012 Exogenous DNA damage Chemicals • Food • Benzopyrene – smoke UV radiations (Sunlight) Ionizing raidation • radon • Cosmic rays • X rays UV irradiation generate primarily Thymine dimers:

Genome Evolution. Amos Tanay 2012 Repairing DNA damage Direct repair

Genome Evolution. Amos Tanay 2012 Repairing DNA damage Direct repair

Genome Evolution. Amos Tanay 2012 Thymine Dimers can be corrected by a direct repair

Genome Evolution. Amos Tanay 2012 Thymine Dimers can be corrected by a direct repair mechanism Photon

Genome Evolution. Amos Tanay 2012 BER Deaminated bases are repaired by a base excision

Genome Evolution. Amos Tanay 2012 BER Deaminated bases are repaired by a base excision mechanism.

Genome Evolution. Amos Tanay 2012 BER Spontaneously occuring abasic sites are repaired by the

Genome Evolution. Amos Tanay 2012 BER Spontaneously occuring abasic sites are repaired by the same mechanism

Genome Evolution. Amos Tanay 2012 NER Dimeric bases and bulky lesions, e. g. ,

Genome Evolution. Amos Tanay 2012 NER Dimeric bases and bulky lesions, e. g. , large chemical adducts are repaired by Nucleotide excision repair

Genome Evolution. Amos Tanay 2012 Evolutionary consequences of the rich mutational process Cannot ignore

Genome Evolution. Amos Tanay 2012 Evolutionary consequences of the rich mutational process Cannot ignore dependencies among adjacent sites Mechanisms are evolutionary variable Lifestyle -> Environmental exposure Germline and male/female ratio Mechanisms are variable on the genomic scale – late vs. early replication

Genome Evolution. Amos Tanay 2012 Dynamic Bayesian Networks Conditional probabilities 1 Conditional probabilities 3

Genome Evolution. Amos Tanay 2012 Dynamic Bayesian Networks Conditional probabilities 1 Conditional probabilities 3 2 4 Conditional probabilities T=1 T=2 T=3 T=4 T=5 1 1 1 2 2 2 3 3 3 4 4 4 Synchronous discrete time process

Genome Evolution. Amos Tanay 2012 Context dependent Markov Processes 1 2 3 4 Context

Genome Evolution. Amos Tanay 2012 Context dependent Markov Processes 1 2 3 4 Context determines A markov process rate matrix Any dependency structure make sense, including loops A A A C A When context is changing, computing probabilities is difficult. Think of the hidden variables as the trajectories A A C G A A Continuous time Bayesian Networks Koller-Noodleman 2002

Genome Evolution. Amos Tanay 2012 Modeling simple context in the tree: Phylo. HMM hpaij

Genome Evolution. Amos Tanay 2012 Modeling simple context in the tree: Phylo. HMM hpaij Heuristically approximating the Markov process? hij-1 hij Where exactly it fails? hpaij-1 hkj hpaij hkj+1 hpaij+! hij-1 hij+! Siepel-Haussler 2003

Genome Evolution. Amos Tanay 2012 Log-likelihood to Free Energy • • • We have

Genome Evolution. Amos Tanay 2012 Log-likelihood to Free Energy • • • We have so far worked on computing the likelihood: Computing likelihood is hard. We can reformulate the problem by adding parameters and transforming it into an optimization problem. Given a trial function q, define the free energy of the model as: The free energy is exactly the likelihood when q is the posterior: • Better: when q a distribution, the free energy bounds the likelihood: D(q || p(h|s)) Likelihood

Genome Evolution. Amos Tanay 2012 Energy? ? What energy? • In statistical mechanics, a

Genome Evolution. Amos Tanay 2012 Energy? ? What energy? • In statistical mechanics, a system at temperature T with states x and an energy function E(x) is characterized by Boltzman’s law: • • Z is the partition function: Given a model p(h, s|T) (a BN), we can define the energy using Boltzman’s law • If we think of P(h|s, q):

Genome Evolution. Amos Tanay 2012 Free Energy and Variational Free Energy • The Helmoholtz

Genome Evolution. Amos Tanay 2012 Free Energy and Variational Free Energy • The Helmoholtz free energy is defined in physics as: • This free energy is important in statistical mechanics, but it is difficult to compute, as our probabilistic Z (= p(s)) • The variational transformation introduce trial functions q(h), and set the variational free energy (or Gibbs free energy) to: • The average energy is: • The variational entropy is: • And as before:

Genome Evolution. Amos Tanay 2012 Solving the variational optimization problem Maxmizing U? Focus on

Genome Evolution. Amos Tanay 2012 Solving the variational optimization problem Maxmizing U? Focus on max configurations Maxmizing H? Spread out the distribution • So instead of computing p(s), we can search for q that optimizes the free energy • • This is still hard as before, but we can simplify the problem by restricting q (this is where the additional degrees of freedom become important)

Genome Evolution. Amos Tanay 2012 Simplest variational approximation: Mean Field Maxmizing U? Focus on

Genome Evolution. Amos Tanay 2012 Simplest variational approximation: Mean Field Maxmizing U? Focus on max configurations Maxmizing H? Spread out the distribution • Let’s assume complete independence among r. v. ’s posteriors: • Under this assumption we can try optimizing the qi – (looking for minimal energy!)

Genome Evolution. Amos Tanay 2012 Mean Field Inference • We optimize iteratively: • Select

Genome Evolution. Amos Tanay 2012 Mean Field Inference • We optimize iteratively: • Select i (sequentially, or using any method) • Optimize qi to minimize FMF(q 1, . . , qi, …, qn) while fixing all other qs • Terminate when FMF cannot be improved further • Remember: FMF always bound the likelihood • qi optimization can usually be done efficiently

Genome Evolution. Amos Tanay 2012 Adaptive mutations: Cairns et al. 88 Luria-Delbruk’s observation Experimental

Genome Evolution. Amos Tanay 2012 Adaptive mutations: Cairns et al. 88 Luria-Delbruk’s observation Experimental system: lacz frameshift The experiment suggests adaptive mutations

Genome Evolution. Amos Tanay 2012 The “Mutator” paradigm: Ability to switch to the mutator

Genome Evolution. Amos Tanay 2012 The “Mutator” paradigm: Ability to switch to the mutator phenotype depends on particular DNA repair mechanisms (Double Strand Break repair in E. Coli) Mutator phenotype is suggested to be important in pathogenesis, antibiotic resistance, and in cancer Species occasionally change (adaptively or even by drift) their repair policy/efficiency The resulted substitution landscape must be very complex