Compressed Sensing Approaches for High Throughput Carrier Screen
































- Slides: 32
Compressed Sensing Approaches for High Throughput Carrier Screen Yaniv Erlich Watson School of Biological Sciences Cold Spring Harbor Laboratory 9/30/09 Joint work with Noam Shental, Amnon Amir and Or Zuk erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Outline • What is a carrier screen? • Our vision - compressed sensing carrier screen • Unique features of our setting • Bayesian reconstruction algorithm • Simulations 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Rare recessive genetic diseases Name 9/30/09 Genotype Phenotype Cystic Fibrosis Normal Healthy ~29/30 Carrier Healthy! ~1/30 Affected Disease 0. 003% Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Carrier breading may lead to devastating results Carrier couple No Carrier 1: 4 9/30/09 Carrier 1: 2 Affected 1: 4 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations What can we do? • Several countries employ nationwide programs - screen the bulk population - very limited set of genes 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Carrier screen - the current mechanism Input: Thousands of specimens. Output: Finding carriers for rare genetic diseases Serial processing: - sequence: 1 region of 1 person per reaction - expensive and does not scale A needle in a haystack problem 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Carrier screens - our vision Ultra-high throughput carrier screen Many specimens + many regions • Adding more genes to the test panel while keeping the task in a tractable scale • Increase the participation by reducing the cost 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Next generation sequencers – parallel processing Sequence 100 million DNA molecules in a single batch (~1 week) BUT Example: Fraction of reads • On pooled samples - only histogram of the DNA sequence type. When pooling 4 normal specimens and 1 carrier WT allele Mutant How to multiplex many specimens with next generation sequencers? 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Multiplexing - the compressed sensing approach CS principle: when x is sparse, very few measurements are sufficient for faithful reconstruction. y = Φx T pools X Φ y = N The ratio of carrier reads Pooling design 0 -1 matrix 9/30/09 carrier Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations On a budget compressed sensing Compression level Specimens (N) Φ= Pools (t) Random matrix with p=0. 5 Weight (w) • Heavy weight design requires long pooling steps and higher material consumption • Higher compression level is more prone to technical difficulties • We want a very sparse sensing matrix 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Light Chinese Design Inputs: Advantages: N (number of specimens in the experiment) Weight (pooling efforts) • (w-1)-disjunct matrix Algorithm: • The weight does not explicitly depend on the number of specimens 1. Find W numbers {x 1, x 2, …, xw} such that: • Bigger than • Pairwise coprime • The compression level is 2. Generate W modular equations: • Easy to debug 3. Construct the pooling design upon the modular equations Output: Sparse pooling design with mod 6 mod 7 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Not all pools were born equal • The sequencer does not report the absolute number of carriers in the pool • Instead: # carrier reads ~ # total sequence reads Fraction of carriers in the pool / 2 • Pools with ↑sequence reads and ↓carriers provide more reliable information. • The noise is not additive but with correlation to the content of the pool. • We need a reconstruction algorithm that takes into account the reliability of the data from each pool. 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Signal domain 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Signal Domain In traditional CS: Traditional CS decoder solves: In compressed carrier screen: • What are the implications of using traditional decoder and employing rounding procedure? • Can we find reconstruction procedure that directly finds 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Bayesian reconstruction algorithm Biological data Biological expectations Biologically, the genotype of one specimen is not dependent on the genotype of other one (unless relatives) Pooling data Pooling model and sequencing Only the specimens in the pool are affecting the pool results Φ Approximation by loopy Belief Propagation… 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Advantages of Belief Propagation • Bottom up approach – weighs the reliability of each individual pool • Bayesian – everything speaks the same language. Can incorporate a-priori medical information and familial connections. • Encoding advantage – Chinese pooling ensures that there are no short cycles • Binary results directly – no rounding procedure at the end Biological data 9/30/09 Compressed sensing carrier screen Pooling data erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations of compressed carrier screen in Ashkenazi Jews Genetic Disorder Carrier rate Tay-Sachs 1: 25 Cystic Fibrosis 1: 30 • Chinese pooling design Familial Dysautonomia 1: 30 • Comparing GPSR (traditional solver) and BP Usher Syndrome 1: 40 Canavan 1: 40 Glycogen Storage 1: 71 • Evaluating Nmax – the largest number of specimens for which at least 48 out of 50 runs give 100% accuracy. Fanconi Anemia C 1: 80 Niemann-Pick 1: 80 Mucolipidosis type 4 1: 100 Bloom 1: 102 Nemaline Myopathay 1: 108 9/30/09 • Finding carriers for two Ashkenazi Jews diseases: Tay-Sachs and Bloom syndrome. Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Results Bloom BP GPSR 9/30/09 Tay-Sachs Pools/Specimen = 6. 5% Compressed sensing carrier screen Pools/Specimens= 13% erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Conclusions • CS framework can be utilized for ultra-high throughput carrier screens. • Our setting shows several unique features not in traditional framework - We suggest tailored encoding (light Chinese) and decoding (BP) procedures • At least in our settings: a tailor decoder, BP, has an advantage over reconstructing with off-the shelf CS solver • CS carrier screen has the potential to reduce dramatically the cost of sequencing. 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Intro - carrier screens CS vision Unique features An ongoing study… Introduct ion 9/30/09 Naïve Solutions Chinese Pooling Analysis BP solver Simulations The r eal th ing Results Compressed sensing carrier screen erlich@cshl. edu
Acknowledgements Funding: Greg Hannon Lindsay Goldberg Ph. D Fellowship ACM/IEEE-CS HPC Ph. D Fellowship Noam Shental Or Zuk & Amnon Amir Igor Carron (Nuit Blanche) For more information: hannonlab. cshl. edu/labmembers/erlich 9/30/09 Compressed sensing carrier screen erlich@cshl. edu
Loopy belief propagation is tricky Damping is the key 9/30/09 DNA Sudoku erlich@cshl. edu
9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Pooling imperfections Pools not in use • Background contamination • Pooling failures (erasures) Data from a real experiment # Reads mod 377 Pools 9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain 9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain 9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain 9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain 9/30/09 erlich@cshl. edu
Intro - carrier screens CS vision Unique features BP solver Simulations Distinctions from traditional CS • ‘On a budget’ compressed sensing • Not all pools were born equal • Pooling imperfections • Signal domain 9/30/09 erlich@cshl. edu