Distance in Trees dijT the length of a
- Slides: 26
Distance in Trees dij(T) - the length of a path between leaves i and j j i d 1, 4 = 12 + 13 + 14 + 17 + 13 = 69
Phylogenetic Tree Reconstruction • Input: • • Distance matrix D Output: • Binary Tree T such that dij(T) = Dij
Reconstructing a 3 Leaved Tree • • Tree reconstruction for any 3 x 3 matrix is straightforward We have 3 leaves i, j, k and a center vertex c Observe: dic + djc = Dij dic + dkc = Dik djc + dkc = Djk
Reconstructing a 3 Leaved Tree (cont’d) dic + djc = Dij + dic + dkc = Dik 2 dic + djc + dkc = Dij + Dik 2 dic + Djk = Dij + Dik dic = (Dij + Dik – Djk)/2 Similarly, djc = (Dij + Djk – Dik)/2 dkc = (Dki + Dkj – Dij)/2
Trees with > 3 Leaves • An tree with n leaves has 2 n-3 edges • This means fitting a given tree to a distance matrix D requires solving a system of “n choose 2” equations with 2 n-3 variables • This is not always possible to solve for n > 3
The Four Point Condition Compute: 1. Dij + Dkl, 2. Dik + Djl, 3. Dil + Djk 2 2 and 3 represent the same number: the length of all edges + the middle edge (it is counted twice) 3 1 1 represents a smaller number: the length of all edges – the middle edge
The Four Point Condition • Four point condition: For i, j, k, l two of the sums Dij + Dkl, Dik + Djl, Dil + Djk are equal and the third sum is smaller • Definition : An n x n matrix D is additive provided there exists a tree T with D(T) = D. (Note: T is unique. ) • Theorem: D is additive if and only if the four point condition holds for every quartet 1 ≤ i, j, k, l ≤ n
Additive Distance Matrices Matrix D is ADDITIVE if there exists a tree T with dij(T) = Dij NON-ADDITIVE otherwise
Reconstructing Additive Distances Given T x D v w x y z y T v w x y z 0 10 17 16 16 0 15 14 14 0 9 15 v 0 14 If we know T and D, but do not know the length of each edge, we can reconstruct those lengths 0 z w
Reconstructing Additive Distances Given T v D v w 0 10 17 16 16 w 0 x x y x z T y 15 14 14 0 y 9 15 0 14 z z a w 0 dvx + dwx = 2 dax + dvw a D 1 x y z a x y z 0 11 10 10 0 9 15 0 14 0 dax = ½ (dvx + dwx – dvw) day = ½ (dvy + dwy – dvw) daz = ½ (dvz + dwz – dvw) v
Reconstructing Additive Distances Given T x a D 1 a x y z 0 11 10 10 0 9 15 0 14 x y z D 2 a b z y 4 b z 7 0 a b z 0 6 10 0 D 3 a c 0 3 0 T 5 3 c 3 a 4 w 6 d(a, c) = 3 d(b, c) = d(a, b) – d(a, c) = 3 d(c, z) = d(a, z) – d(a, c) = 7 d(b, x) = d(a, x) – d(a, b) = 5 d(b, y) = d(a, y) – d(a, b) = 4 d(a, w) = d(z, w) – d(a, z) = 4 d(a, v) = d(z, v) – d(a, z) = 6 Correct!!! v
Distance Based Phylogeny Problem • • Goal: Reconstruct an evolutionary tree from a distance matrix Input: n x n distance matrix Dij Output: weighted tree T with n leaves fitting D If D is additive, this problem has a solution and there is a simple algorithm to solve it
Using Neighboring Leaves to Construct the Tree • • • Find neighboring leaves i and j with parent k Remove the rows and columns of i and j Add a new row and column corresponding to k, where the distance from k to any other leaf m can be computed as: Dkm = (Dim + Djm – Dij)/2 Compress i and j into k, iterate algorithm for rest of tree
Finding Neighboring Leaves • To find neighboring leaves we simply select a pair of closest leaves.
Finding Neighboring Leaves • To find neighboring leaves we simply select a pair of closest leaves. WRONG
Finding Neighboring Leaves Closest leaves aren’t necessarily neighbors i and j are neighbors, but (dij = 13) > (djk = 12) • • • Finding a pair of neighboring leaves is a nontrivial problem!
Degenerate Triples • A degenerate triple is a set of three distinct elements 1≤i, j, k≤n where Dij + Djk = Dik • Element j in a degenerate triple i, j, k lies on the evolutionary path from i to k (or is attached to this path by an edge of length 0).
Looking for Degenerate Triples • If distance matrix D has a degenerate triple i, j, k then j can be “removed” from D thus reducing the size of the problem. • If distance matrix D does not have a degenerate triple i, j, k, one can “create” a degenerative triple in D by shortening all hanging edges (in the tree).
Shortening Hanging Edges to Produce Degenerate Triples • Shorten all “hanging” edges (edges that connect leaves) until a degenerate triple is found
Finding Degenerate Triples • • • If there is no degenerate triple, all hanging edges are reduced by the same amount δ, so that all pair-wise distances in the matrix are reduced by 2δ. Eventually this process collapses one of the leaves (when δ = length of shortest hanging edge), forming a degenerate triple i, j, k and reducing the size of the distance matrix D. The attachment point for j can be recovered in the reverse transformations by saving Dij for each collapsed leaf.
Reconstructing Trees for Additive Distance Matrices Trim(D, δ) for all 1 ≤ i ≠ j ≤ n Dij = Dij - 2δ
Additive. Phylogeny Algorithm 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Additive. Phylogeny(D) if D is a 2 x 2 matrix T = tree of a single edge of length D 1, 2 return T if D is non-degenerate Compute trimming parameter δ Trim(D, δ) Find a triple i, j, k in D such that Dij + Djk = Dik x = Dij Remove jth row and jth column from D T = Additive. Phylogeny(D) Traceback
Additive. Phylogeny (cont’d) Traceback 1. 2. 3. 4. 5. 6. 7. 8. Add a new vertex v to T at distance x from i to k Add j back to T by creating an edge (v, j) of length 0 for every leaf l in T if distance from l to v in the tree ≠ Dl, j output “matrix is not additive” return Extend all “hanging” edges by length δ return T
Neighbor Joining Algorithm • In 1987 Naruya Saitou and Masatoshi Nei developed a neighbor joining algorithm for phylogenetic tree reconstruction • Finds a pair of leaves that are close to each other but far from other leaves: implicitly finds a pair of neighboring leaves • Advantages: works well for additive and other non-additive matrices, it does not have the flawed molecular clock assumption
Neighbor-Joining • • Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additive 1 Let C = current clusters. Step 1: Finding neighboring clusters Define: u(C) =1/(|C|-2) C’ 2 C D(C, C 0 ) u(C) measures separation of C from other clusters 3 0. 1 Want to minimize D(C 1, C 2) and maximize u(C 1) + u(C 2) Magic trick: Choose C 1 and C 2 that minimize D(C 1, C 2) - (u(C 1) + u(C 2) ) 0. 1 0. 4 2 Claim: Above ensures that Dij is minimal iff i, j are neighbors Proof: Very technical, please read Durbin et al. ! 0. 1 0. 4 4
Algorithm: Neighbor-joining Initialization: For n clusters, one for each leaf node Define T to be the set of leaf nodes, one per sequence Iteration: Pick Ci, Cj s. t. D(Ci, Cj) – (u(C 1) + u(C 2)) is minimal Merge C 1 and C 2 into new cluster with |C 1| + |C 2| elements Add a new vertex C to T and connect to vertices C 1 and C 2 Assign length 1/2 (D(C 1, C 2) + (u(C 1) - u(C 2) ) to edge (C 1, C) Assign length 1/2 (D(C 1, C 2) + (u(C 2) - u(C 1) ) to edge (C 2, C) Remove rows and columns from D corresponding to C 1 and C 2; Add row and column to D for new cluster C Termination: When only one cluster
- What is the ratio of the length of to the length of ?
- Metric system distance
- What is the distance between distance and displacement
- The ratio of input distance to output distance
- Các châu lục và đại dương trên thế giới
- Hươu thường đẻ mỗi lứa mấy con
- Bổ thể
- Diễn thế sinh thái là
- Vẽ hình chiếu vuông góc của vật thể sau
- Làm thế nào để 102-1=99
- Alleluia hat len nguoi oi
- Lời thề hippocrates
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- đại từ thay thế
- Quá trình desamine hóa có thể tạo ra
- Công thức tiính động năng
- Khi nào hổ mẹ dạy hổ con săn mồi
- Thế nào là mạng điện lắp đặt kiểu nổi
- Tỉ lệ cơ thể trẻ em
- Các loại đột biến cấu trúc nhiễm sắc thể
- Biện pháp chống mỏi cơ
- Phản ứng thế ankan
- Thiếu nhi thế giới liên hoan
- Các môn thể thao bắt đầu bằng tiếng nhảy
- điện thế nghỉ
- Tia chieu sa te
- Hình ảnh bộ gõ cơ thể búng tay