Reconstructing Ancestral DNA at least the gaps Using

  • Slides: 39
Download presentation
Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine

Reconstructing Ancestral DNA (at least the gaps) Using unrooted phylogeny, multiple alignment, and affine gap cost function. Work in progress.

Overview • • Introduction Examples Gap Graph construction Theory Algorithm Results Next steps 2

Overview • • Introduction Examples Gap Graph construction Theory Algorithm Results Next steps 2

Example N 3 ? ? ? 3

Example N 3 ? ? ? 3

Example N 3 nnn 1. (a) Two long indels. 4

Example N 3 nnn 1. (a) Two long indels. 4

Example N 3 n-n 1. (a) Two long indels. 2. (b) Three short indels.

Example N 3 n-n 1. (a) Two long indels. 2. (b) Three short indels. 5

Example N 3 nnn/n-n 1. (a) Two long indels. 2. (b) Three short indels.

Example N 3 nnn/n-n 1. (a) Two long indels. 2. (b) Three short indels. Which is more parsimonious depends on gap cost function: Cost of indel of length k is g(k) = a + b*k 6

Harder Example N 8, N 9, N 10, N 11, N 12, N 13

Harder Example N 8, N 9, N 10, N 11, N 12, N 13 ? ? ? Problem: find optimal explanation for gaps in terms of indels. 7

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 8

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves Vertex: a) subtree with gaps in all leaves b) section of alignment 9

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 10

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 11

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 12

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 13

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 14

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap

Gap Representation 1. Find gap intervals 2. Create minimal tree covering in each gap interval: minimal number of subtrees with gaps in all leaves 15

Gap Graph Construction 3. Create connections between neighbors v and w if one is

Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other. 16

Gap Graph Construction 3. Create connections between neighbors v and w if one is

Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other. 17

Gap Graph Construction 3. Create connections between neighbors v and w if one is

Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other. 18

Gap Graph Construction 3. Create connections between neighbors v and w if one is

Gap Graph Construction 3. Create connections between neighbors v and w if one is contained in the other. 19

What is a vertex? Either one indel created all gaps in the subtree, or

What is a vertex? Either one indel created all gaps in the subtree, or the vertex (subtree) is decomposed into several indels. Algorithm goal: confirm or decompose vertices using gap cost function. 20

Flashback: ~ Jotun’s Algorithm This example can be solved optimally: using a=5, b=3, all

Flashback: ~ Jotun’s Algorithm This example can be solved optimally: using a=5, b=3, all vertices are confirmed. - i. e. , all gaps created ‘as high as possible’ in the tree. 21

Horrific Counter Example At first sight: confirm all vertices. . (0, 1) (1, 2,

Horrific Counter Example At first sight: confirm all vertices. . (0, 1) (1, 2, 3, 4) (0, 1, 2, 3) 22

Horrific Counter Example At first sight: confirm all vertices. . 6 indels. (0, 1)

Horrific Counter Example At first sight: confirm all vertices. . 6 indels. (0, 1) (1, 2, 3, 4) (0, 1, 2, 3) 23

Horrific Counter Example At first sight: confirm all vertices. . 6 indels. BUT: solution

Horrific Counter Example At first sight: confirm all vertices. . 6 indels. BUT: solution with 5 indels can be found! Depending on gap cost function, this may be cheaper. Thus first solution may not be optimal Problem: the indel (2) is invisible. (0, 1) (1, 2, 3, 4) (0, 1, 2, 3) 24

New Type of Connection Needed! 3. Create connections between neighbors v and w if

New Type of Connection Needed! 3. Create connections between neighbors v and w if one is contained in the other if they share leaves. - The indel (2) lies in the intersection of the cousins. (0, 1) (1, 2, 3, 4) (0, 1, 2, 3) 25

Now The(st)ory Begins By construction of the gap graph, we can prove two theorems:

Now The(st)ory Begins By construction of the gap graph, we can prove two theorems: Theorem 1 Each optimal indel either corresponds directly to a vertex, or it crosses a cousin connection. Only possible optimal indels: (0, 1) (3) (0, 1, 2, 3) (1, 2, 3, 4) (1) (4) (2) (1, 2, 3, 4) (0, 1, 2, 3) 26

Now Theory Begins By construction of the gap graph, we can prove two theorems:

Now Theory Begins By construction of the gap graph, we can prove two theorems: Theorem 2 If a vertex v is decomposed in the optimal solution, all decomposing indels extend beyond v’s section of the alignment, and they do not all extend in the same direction. Thus we have to decompose none or both of (0, 1, 2, 3) and (1, 2, 3, 4): otherwise (2) doesn’t extend beyond the region of (0, 1, 2, 3) (1, 2, 3, 4) (0, 1, 2, 3) 27

Now Theory Begins From theorems we can prove some lemmas: 1: Leaf vertices can

Now Theory Begins From theorems we can prove some lemmas: 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed. 28

Solving Earlier Example 1: Leaf vertices can be confirmed. 29

Solving Earlier Example 1: Leaf vertices can be confirmed. 29

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 30

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed. 31

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed. 32

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices

Solving Earlier Example 1: Leaf vertices can be confirmed. 2: Orphans / end vertices can be confirmed. 3: Patriarchs can be confirmed and trimmed. 4: Mono-chain vertices can be decided locally. 33

End of Pre-Processing • In longer examples there will be undecided vertices (purple) after

End of Pre-Processing • In longer examples there will be undecided vertices (purple) after pre-processing. • Find possible decompositions for each vertex and check all combinations in each chain 34

9 sequences, 60% gaps, preproc. time < 4 s ---------- • Alignment length 3936,

9 sequences, 60% gaps, preproc. time < 4 s ---------- • Alignment length 3936, divided in 3922 gap intervals. • ---------- • 1497 vertices undecided before trimming. • 1112 vertices undecided after trimming. • ---------- • Created 8912 vertices, 871 connections. Confirmed • 5469 leaf vertices, • 2285 patriarchs, • 210 end vertices, • 217 locally confirmed non-cousin chain vertices, • 37 locally confirmed cousin chain vertices, and • 487 mono-chain decomposed vertices. • ---------- • 207 vertices undecided after all preprocessing. • #chains with undecided: 89, max #undecided in same chain (C 31): 7 • estimated number of combinations: 2788, max in same chain: 1152 • ----------35

9 sequences, 60% gaps, preproc. time < 4 s ---------- • Alignment length 3936,

9 sequences, 60% gaps, preproc. time < 4 s ---------- • Alignment length 3936, divided in 3922 gap intervals. • ---------- • 1497 vertices undecided before trimming. • 1112 vertices undecided after trimming. • ---------- • Created 8912 vertices, 871 connections. Confirmed • 5469 leaf vertices, • 2285 patriarchs, • 210 end vertices, • 217 locally confirmed non-cousin chain vertices, • 37 locally confirmed cousin chain vertices, and • 487 mono-chain decomposed vertices. • ---------- • 207 vertices undecided after all preprocessing. • #chains with undecided: 89, max #undecided in same chain (C 31): 7 • estimated number of combinations: 2788, max in same chain: 1152 • ----------36

Is Pre-Processing Important? 9 sequences, 60% gaps; no pre-processing: • • • ----------Created 10082

Is Pre-Processing Important? 9 sequences, 60% gaps; no pre-processing: • • • ----------Created 10082 vertices, 7121 connections. ----------1497 vertices undecided with no preprocessing. #chains with undecided: 950, max #undecided in same chain (C 40): 10 estimated number of combinations: 71950, max in same chain: 34560 9 sequences, 60% gaps; with pre-processing: • • • ----------Created 8912 vertices, 871 connections. ----------207 vertices undecided after all preprocessing. #chains with undecided: 89, max #undecided in same chain (C 31): 7 estimated number of combinations: 2788, max in same chain: 1152 37

Next Steps • Make poster for Recomb (suggestions? ) • Finish program • Run

Next Steps • Make poster for Recomb (suggestions? ) • Finish program • Run it on real data • Ideas for applications? (Score ranks alignment – use to find alignment. . ) • Demo 38

Screenshots (in case demo doesn’t work) 39

Screenshots (in case demo doesn’t work) 39