LinearTime Reconstruction of ZeroRecombinant Mendelian Inheritance on Pedigrees
, Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside USA
Outline Ø n n n Introduction and problem definition The linear system for ZRHC A linear-time algorithm for Loop-free ZRHC Conclusion
Pedigree n An example: British Royal Family
Biological Background n Basic concepts paternal n Mendelian Law: one haplotype comes from the father and the other comes from the mother. maternal 11 22: homozygous 12: heterozgyous 1|2: ps-value 0 2|1 : ps-value 1 Example: Mendelianexperiment
Notations and Recombinant 11 22 2 2 Genotype 12 22 2 1 2 2 Haplotype Configuration 1 1 2 2 2 2 Father 2 2 Mother 1 1 2 2 Child 0 recombinant 1 1 2 2 2 2 Father 2 2 Mother : recombinant 1 1 2 2 2 Child 1 recombinant
Haplotype Configuration Reconstruction n Haplotypes: useful, but expensive to obtain Genotypes: not so informative, but cheaper to obtain n n In biological application, genotypes instead of haplotypes are collected. How to reconstruct haplotype from genotype? recombination-free assumption 1 2 2 1 1 1 (b) 2 2
The Loop-free ZRHC problem n Problem definition Given a loop-free pedigree and the genotype information for each member, find a recombinationfree haplotypeconfigurationfor each member that obeys the Mendelianlaw of inheritance.
Solutions to the ZRHC problem n n A particular solution: any numerical assignment A general solution: the span of a basis in the solution space to its associated homogeneous system, offset from the origin by a vector, namely by any particular solution.
An Example 0 1 2 1 1 n Input genotype n x+z+w x y y+z+w x+z y+z n A general solution 0 1 2 0 2 0 0: 1 | 2 1: 2 | 1 A general solution 1 2 2 1 0 1 2 1 0 x=0 y=1 z=0 w=1
Previous Work and Our Progress ZRHC Loop-fee ZRHC Li and Jiang introduced a system of linear equations over F[2] and presented an O(m 3 n 3) time algorithm for ZRHC [LJ 03] Xiao et al. present a much faster algorithm for ZRHC with running time O(mn 2+n 3 log 2 n log n) to generate a general solution and O(mn+n 3 log 2 n log n) to produce a particular solution. [XLX+07] Xiao et al. ’ s algorithm has running time O(mn 2+n 3) to produce a general solution and O(mn+n 3) to generate a particular solution. [XLX+07] Chan et al. proposed a linear-time (i. e. O(mn) time) algorithm to find a particular solution. [CCC+06] We present a novel algorithm with running time O(mn 2) to produce a general solution and O(mn) to generate a particular solution. In pedigree n m : #loci n n: #members
Related work n n n Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k 2. 376) on k equations with k unknowns The Lanczos and conjugate gradient algorithms are only heuristics [GV 96]. The Wiedeman algorithm has expected quadratic running time [W 86]
Outline n Ø n n Introduction and problem definition The linear system for ZRHC A linear-time algorithm for Loop-free ZRHC Conclusion
The New Linear System n n, m n n m : #loci n: #members in pedigree Unknowns n : the paternal haplotype vector of a member j. : the scalar demonstrating inheritance info between a parent j 1 and a child j. n
The New Linear System j 2 j 1 0 0 1 1 0 0 0 0 j 0 0 0 1 1 1 0 1 pj 1, 2=1 pj 1, 3=0 0 1 1 1 j 2 j 1 Pj 1, 1 pj 1, 2 pj 1, 3 pj 1, 4 Pj 1, 1 +1 pj 1, 2 +0 pj 1, 3 +0 pj 1, 4 +1 Pj 1 +wj 1 hj 1, j Pj 2, 1 pj 2, 2 pj 2, 3 pj 2, 4 Pj 2 h j 2, j j Pj, 1 pj, 2 pj, 3 pj, 4 Pj Pj 2, 1 +0 pj 2, 2 +1 pj 2, 3 +1 pj 2, 4 +1 Pj 2 +wj 2 Pj, 1 +1 pj, 2 +1 pj, 3 +0 pj, 4 +0 Pj +wj
The Linear System § O(mn) equations on O(mn) unknowns. § Given a homozygous locus i on a member j (with a child j 1), pj[i] and pj 1[i] are pre-determined. Ax=b
Pedigree Graph n A pedigree with genotype 12 22 11 12 12 1 12 2 12 Pedigree graph G 2 1 12 11 n 12 4 12 12 11 7 6 22 12 4 7 6 12 8 22 9 12 22 8 22 9 12 #edges · 2 n
Locus Graph § Locus graph Gi Gi = (V, Ei), where Ei= {(k, j)| k is a parent of j, wk[i]=1} 12 22 1 ? 2 1 h 1, 4 12 6 11 7 1 4 1 6 0 h 6, 8 12 Zero-weight 22 8 9 (a) Genotype info Example: Locus graph for therd 3 locus h 4, 9 h 8, 9 1 9 (b) Locus graph 0 8 7 :
An Observation § For any path in a locus graph connecting two pre-determined vertices, the summation of h-variables along the path is a constant. We can use paths to denote constraints! § (proof sketch) Assume the path connecting two pre-determined vertices j 0 and jk. Pj 0[i] … dj 1, j 2 hj 1, j 2 dj 0, j 1 hj 0, j 1 Pj 1[i] Pj 0[i]+ hj 0, j 1 = Pj 1[i]+ hj 1, j 2 P =j 1[i] Pj 2[i]+ hj 2, j 2 = … Pj 3[i] Pjk-1[i]+ hjk-1, jk= Pjk[i] Pj 2[i] in locus graph Gi djk-1, jk hjk-1, jk Pjk-1[i] Pjk[i] + dj 0, j 1 + dj 1, j 2 + dj 2, j 3 + djk-1, jk a constant
Examples of Linear Constraints ? 1 2 1 1 4 1 6 0 h 6, 8 h 8, 9 1 9 (a) 1 st locus graph h 6, 8 + h 8, 9= 1 0 8 7
Linear Constraints Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. n Moreover, we can upper bound #constraints in each locus graph as O(n), while the trivial analysis gives an upper bound O(n 2). n Total #constraints = O(mn). n O(n) transformation Ax=b O(mn) Ax=b The linear constraints only contain h-variables
Outline n n Ø n Introduction and problem definition The linear equations for ZRHC A linear-time algorithm for ZRHC Conclusion
The Loop-free ZRHC-PHASE algorithm Algorithm. Loop-free ZRHC_PHASE Traditional method input: a pedigree G=(V, E) and genotype {gj} § Solve h-variables and p- output: a general solution of {pj} begin Step 1. Preprocessing Step 2. Linear constraint generation on h-variables Step 3. Solve h-variables by redundant equation elimination and a novel mapping method Step 4. Solve the p-variables by propagation from predetermined p-variables to others. end variables together § O(mn) equations on O(mn) unknowns: O(mn) p-variables and O(n) h-variables. Our method § Solve h-variables and pvariables separately § O(mn) linear equations on O(n) h-variables.
Redundant Equation Elimination n An observation j 0 Given a path P = j 0, …, jk, assume that there are constraints among each pair of vertices. j 1 n j 2 jk … jk-2 jk-1 j 0 ~ j 2 ~ jk-1 j 0 ~ jk-1 n Key lemma Originally, there are O(k 2) constraints. Notice that they are not independent. n However, we can replace the original constraints by an equivalent set of constraints with size O(k). n Remove the redundant equations without solving them! Given a set S of constraints on a tree pedigree T, we can reduce S to an equivalent constraint set of size at most n in time O(mn).
O(n) transformation Ax=b redundancy elimination O(n ) O(n) Ax=b
Solving h-variables In order to obtain a linear-time algorithm, we want to avoid the Gaussian elimination method. n An observation Given a constraint along a path j 0 , j 1, …, jk-1 , jk n j 0 j 1 … jk-1 jk h j 0 , j 1 +hj 1 , j 2 + …+ h jk-1, j = b k We can solve the constraint in the following way: n Assign the h-variables on edges (j 0 , j 1), (j 1, j 2), …, (jk-2, jk-1) arbitrarily. Assign the h-variables on the last edge (jk-1, jk) as a fixedvalue to satisfy the constraint: hjk-1, j k = hj 0 , j 1 + …+ h jk-2, j k-1+ b. n
Solving h-variables Based on the Mapping f n n We have constructed the infective mapping f : S -> E , where S is the constraint set and E is the edge set. We solve h-variables as follows: n n For each h-variable corresponding to an edge e not in f (S), assign an arbitraryvalue. For each h-variable corresponding to an edge e in f (S), assign a fixedvalue based on the constraint f – 1(e), such that the constraint is satisfied. h-variables can be solved by a single BFS Traversal.
Conclusion n We present an efficient algorithm for Loop-fee ZRHC with running time O(mn) to generate a particular solution and O(mn 2) to generate a general solution.
Thanks for your time and attention!
- Slides: 28