Codes for Deletion and Insertion Channels with Segmented

















- Slides: 17
Codes for Deletion and Insertion Channels with Segmented Errors Zhenming Liu Michael Mitzenmacher Harvard University, School of Engineering and Applied Sciences
The Most Basic Channels • Binary erasure channel. – Each bit is replaced by a ? with probability p. • Binary symmetric channel. – Each bit flipped with probability p. • Binary deletion channel. – Each bit deleted with probability p.
The Most Basic Channels • Binary erasure channel. – Each bit is replaced by a ? with probability p. – Very well understood. • Binary symmetric channel. – Each bit flipped with probability p. – Very well understood. • Binary deletion channel. – Each bit deleted with probability p. – We don’t even know the capacity!!!
Motivation • Capacity/coding results for deletion/insertion channels are very hard. – Very little theory for practical coding schemes. – Huge gap between codes and capacity bounds. • Perhaps this is an artifact of the model. – Are independent deletions/insertions the right model for insertions/deletions in practice? – Do different models yield much better results? • If so, would highlight challenges of original model.
Model Motivation • Claim: Deletion/insertion errors occur because of timing mismatches. – Mechanisms running at slightly different speeds. – Clock drift. • After one deletion (or insertion), some time passes before the next.
Channel Model : Segmented Deletions • Input is divided into consecutive blocks of b bits. • Channel guarantee: at most one deletion per block. • No block markers at output. • Example: b = 8. 000101111 00001111 000101111 00010111001011
Segmented Deletion Model • More general than models requiring a gap between deletions. – Two consecutive deletions can occur on the boundary. • Can define similar segmented insertion model.
Codes for Segmented Deletions : Our Approach • Create a codebook C with strings of b bits. • Codeword is concatenation of blocks from C. • Aim to decode blocks from left to right, without losing synchronization, regardless of errors. • Questions: – How can this be done? – What properties does C need? – How large can C be?
Notation • Let D 1(u) be all strings obtainable by deleting 1 bit from u. – And • Codebook C is 1 -deletion correcting if – Fixed map from strings with 1 deletion to codeword. – Our C will have this property. • Let pref(u) be first k – 1 bits of k-bit string u, and suff(u) be last k – 1 bits. – Similarly define pref(S), suff(S).
Intuition • At start of decoding, after reading first b – 1 bits, we know the first block. – Assuming C is 1 -deletion correcting. • But don’t know if next block starts at bit b or bit b + 1 of received string. Sent : 00100100? ? ? ? Received : 00100100… – Is marked received 0 from 1 st block or 2 nd? – Can’t resolve ambiguity. • Need to make sure ambiguity does not grow. • Key invariant: each successive block starts in one of two positions.
Theorem Statement • For a segmented deletion channel with blocklength b, consider a codebook C of strings of length b satisfying: • Such a codebook allows linear time left-toright decoding.
Proof Sketch • Maintain invariant: suppose block starts at position k or k + 1 of received string R. To decode block: – Done if – Otherwise – and this determines the sent block. – As long as sent block not of form – next block starts at position k + b – 1 or k + b.
Finding Valid Codebooks • Restrictions lead to independent set problem. – Each possible b-bit codeword is a vertex. • Throw out vertices for restricted strings. – Edge between two vertices u, v if – Maximum independent set = largest codebook. • Can be found exhaustively for small b. • Use heuristics (greedy) for larger b.
Results • Codes from exhaustive search: – 8 bit blocks, 12 codewords : rate > 44% – 9 bit blocks, 20 codewords : rate > 48% • Codes from heuristics: – 16 bit blocks, 740 codewords : rate > 59%. • Decoding simple – easily done in hardware.
Insertions • Can analyze segmented insertion channels the same way. • Surprising result: the codebooks for insertions and codebooks for deletions have the same properties! – Non-obvious symmetry!
Improvements • Extended scheme simulated in extended version of paper. • Ideas: – Increase C so that multiple decodings are locally possible (per block). – Use parity checks (local/global) to remove spurious decodings. – Use dynamic programming to enforce globally consistent decoding. • Results in higher rates, but slower, and currently no provable guarantees.
Conclusions and Open Questions • Codes ready for implementation. – Any users? • Theoretical limits. – Capacity bounds for segmented channels? – Time/capacity tradeoffs? • Possible improvements. – Analysis of more general dynamicprogramming based scheme?