Codes for Deletion and Insertion Channels with Segmented

  • Slides: 17
Download presentation
Codes for Deletion and Insertion Channels with Segmented Errors Zhenming Liu Michael Mitzenmacher Harvard

Codes for Deletion and Insertion Channels with Segmented Errors Zhenming Liu Michael Mitzenmacher Harvard University, School of Engineering and Applied Sciences

The Most Basic Channels • Binary erasure channel. – Each bit is replaced by

The Most Basic Channels • Binary erasure channel. – Each bit is replaced by a ? with probability p. • Binary symmetric channel. – Each bit flipped with probability p. • Binary deletion channel. – Each bit deleted with probability p.

The Most Basic Channels • Binary erasure channel. – Each bit is replaced by

The Most Basic Channels • Binary erasure channel. – Each bit is replaced by a ? with probability p. – Very well understood. • Binary symmetric channel. – Each bit flipped with probability p. – Very well understood. • Binary deletion channel. – Each bit deleted with probability p. – We don’t even know the capacity!!!

Motivation • Capacity/coding results for deletion/insertion channels are very hard. – Very little theory

Motivation • Capacity/coding results for deletion/insertion channels are very hard. – Very little theory for practical coding schemes. – Huge gap between codes and capacity bounds. • Perhaps this is an artifact of the model. – Are independent deletions/insertions the right model for insertions/deletions in practice? – Do different models yield much better results? • If so, would highlight challenges of original model.

Model Motivation • Claim: Deletion/insertion errors occur because of timing mismatches. – Mechanisms running

Model Motivation • Claim: Deletion/insertion errors occur because of timing mismatches. – Mechanisms running at slightly different speeds. – Clock drift. • After one deletion (or insertion), some time passes before the next.

Channel Model : Segmented Deletions • Input is divided into consecutive blocks of b

Channel Model : Segmented Deletions • Input is divided into consecutive blocks of b bits. • Channel guarantee: at most one deletion per block. • No block markers at output. • Example: b = 8. 000101111 00001111 000101111 00010111001011

Segmented Deletion Model • More general than models requiring a gap between deletions. –

Segmented Deletion Model • More general than models requiring a gap between deletions. – Two consecutive deletions can occur on the boundary. • Can define similar segmented insertion model.

Codes for Segmented Deletions : Our Approach • Create a codebook C with strings

Codes for Segmented Deletions : Our Approach • Create a codebook C with strings of b bits. • Codeword is concatenation of blocks from C. • Aim to decode blocks from left to right, without losing synchronization, regardless of errors. • Questions: – How can this be done? – What properties does C need? – How large can C be?

Notation • Let D 1(u) be all strings obtainable by deleting 1 bit from

Notation • Let D 1(u) be all strings obtainable by deleting 1 bit from u. – And • Codebook C is 1 -deletion correcting if – Fixed map from strings with 1 deletion to codeword. – Our C will have this property. • Let pref(u) be first k – 1 bits of k-bit string u, and suff(u) be last k – 1 bits. – Similarly define pref(S), suff(S).

Intuition • At start of decoding, after reading first b – 1 bits, we

Intuition • At start of decoding, after reading first b – 1 bits, we know the first block. – Assuming C is 1 -deletion correcting. • But don’t know if next block starts at bit b or bit b + 1 of received string. Sent : 00100100? ? ? ? Received : 00100100… – Is marked received 0 from 1 st block or 2 nd? – Can’t resolve ambiguity. • Need to make sure ambiguity does not grow. • Key invariant: each successive block starts in one of two positions.

Theorem Statement • For a segmented deletion channel with blocklength b, consider a codebook

Theorem Statement • For a segmented deletion channel with blocklength b, consider a codebook C of strings of length b satisfying: • Such a codebook allows linear time left-toright decoding.

Proof Sketch • Maintain invariant: suppose block starts at position k or k +

Proof Sketch • Maintain invariant: suppose block starts at position k or k + 1 of received string R. To decode block: – Done if – Otherwise – and this determines the sent block. – As long as sent block not of form – next block starts at position k + b – 1 or k + b.

Finding Valid Codebooks • Restrictions lead to independent set problem. – Each possible b-bit

Finding Valid Codebooks • Restrictions lead to independent set problem. – Each possible b-bit codeword is a vertex. • Throw out vertices for restricted strings. – Edge between two vertices u, v if – Maximum independent set = largest codebook. • Can be found exhaustively for small b. • Use heuristics (greedy) for larger b.

Results • Codes from exhaustive search: – 8 bit blocks, 12 codewords : rate

Results • Codes from exhaustive search: – 8 bit blocks, 12 codewords : rate > 44% – 9 bit blocks, 20 codewords : rate > 48% • Codes from heuristics: – 16 bit blocks, 740 codewords : rate > 59%. • Decoding simple – easily done in hardware.

Insertions • Can analyze segmented insertion channels the same way. • Surprising result: the

Insertions • Can analyze segmented insertion channels the same way. • Surprising result: the codebooks for insertions and codebooks for deletions have the same properties! – Non-obvious symmetry!

Improvements • Extended scheme simulated in extended version of paper. • Ideas: – Increase

Improvements • Extended scheme simulated in extended version of paper. • Ideas: – Increase C so that multiple decodings are locally possible (per block). – Use parity checks (local/global) to remove spurious decodings. – Use dynamic programming to enforce globally consistent decoding. • Results in higher rates, but slower, and currently no provable guarantees.

Conclusions and Open Questions • Codes ready for implementation. – Any users? • Theoretical

Conclusions and Open Questions • Codes ready for implementation. – Any users? • Theoretical limits. – Capacity bounds for segmented channels? – Time/capacity tradeoffs? • Possible improvements. – Analysis of more general dynamicprogramming based scheme?