Lower bounds for approximate membership dynamic data structures
Lower bounds for approximate membership dynamic data structures Shachar Lovett IAS Ely Porat Bar-Ilan University Synergies in lower bounds, June 2011
Information theoretic lower bounds • Information theory is a powerful tool to prove lower bounds, e. g. in data structures • Study size of data structure (unlimited access) • Static d. s. : pure information theory • Dynamic d. s. : communication game
Talk overview • Approximate set membership problem • Bloom filters (simple near-optimal solution) • Lower bounds – static case • New dynamic lower bounds
Talk overview • Approximate set membership problem • Bloom filters (simple near-optimal solution) • Lower bounds – static case • New dynamic lower bounds
Approximate set membership • Large universe U • Represent subset S U U • Query: is x S? ~S S • Data structure representing S approximately: – If x S: answer YES always – If x S: answer NO with high probability • Why approximately? To save space
Applications • Storage (or communication) is costly, but a small false positive error can be tolerated • Original applications (70’s): dictionaries, databases – Bloom filters • Nowadays: mainly network applications
Talk overview • Approximate set membership problem • Bloom filters (simple near-optimal solution) • Lower bounds – static case • New dynamic lower bounds
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} 0 0 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} h(x 1)=4 0 0 0 1 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} h(x 2)=1 1 0 0 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} h(x 3)=4 1 0 0 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} Query: y S? 1 0 0 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} Query: y S? h(y)=3 1 0 0 0 0 Bit array of length m 0 0
Bloom filters Hash function h: U {1, …, m} • S={x 1, x 2, …, xn} Query: y S? NO h(y)=3 1 0 0 0 0 Bit array of length m 0 0
Bloom filters: analysis • S={x 1, x 2, …, xn} • Query: y S? 1 0 0 0 0 hash Bit array of length m – If y S: returns YES always – If y S: returns NO with probability • Error ½: • Error : (repetition)
Known bounds • Upper bounds (e. g. algorithms) – Bloom filter: – Improvements: [Porat-Matthias’ 03, Arbitman-Naor-Segev’ 10] • Lower bounds: – information theoretic: – Can be matched by static data structures [Charles-Chellapilla’ 08, Dietzfelbinger-Pagh’ 08, Porat’ 08] – This work: dynamic d. s.
Talk overview • Approximate set membership problem • Bloom filters (simple near-optimal solution) • Lower bounds – static case • New dynamic lower bounds
Static lower bounds • Static settings: insert + query m bits Insert: x 1, …, xn Query: y • Yao’s min-max principle: prove lower bound for deterministic data structure, randomized inputs
Static lower bounds m bits Insert: x 1, …, xn • Deterministic data structure: compression maps all sets to a small family of sets – Input: random set – Accept set: • Properties: – Small memory: – No false negatives: – Few false positives: • Optimal setting: Query: y
Static lower bounds m bits Insert: x 1, …, xn U A(S) S • Set S, • Represented by • Goal: show #A(S) large Query: y
Static lower bounds Insert: x 1, …, xn • Properties: • Assume that – If – – m bits then • General case: convexity Query: y
Talk overview • Approximate set membership problem • Bloom filters (simple near-optimal solution) • Lower bounds – static case • New dynamic lower bounds
Dynamic lower bounds • Basic dynamic settings: two inserts + query • Break inputs to k, n-k chunks m bits Insert: x 1, …, xk Insert: xk+1, …, xn m bits Query: y
Dynamic lower bounds m bits Insert: x 1, …, xk Insert: xk+1, …, xn m bits Query: y • Accepting sets: • Properties: – – • General approach: analyze size of accepting sets – Sets A(x 1, …, xk) can’t be too small (covering) – Sets A(A(x 1, …, xk), xk+1, …, xn) can’t be too large (error) – These yield the trivial lower bound again…
Dynamic lower bounds m bits Insert: x 1, …, xk Insert: xk+1, …, xn • Method of typical inputs • On a typical input: – A(x 1, …, xk) not too small – A(A(x 1, …, xk), xk+1, …, xn) not too large – Inputs uncorrelated with data structure: • Yields an improved lower bound (note: “typical” can be 1% of inputs) m bits Query: y
Dynamic lower bounds m bits Insert: x 1, …, xk Insert: xk+1, …, xn • Functional inequality: • Free parameter: k – how to break input • Optimal choice: • Extension: break input into more parts – Doesn’t seem to help much m bits Query: y
Summary • Approximate membership problem • Static algorithms match static information theoretic lower bound: • This work: new dynamic information theoretic lower bound THANK YOU!
- Slides: 27