Filters Bloom Quotient Cuckoo CSCI 333 Bloom Filters
Filters (Bloom, Quotient, & Cuckoo) CSCI 333
Bloom Filters • Are there any problems with Bloom filters? • What operations do they support/not support? • How do you grow a Bloom filter? • What if your filter itself exceeds RAM (how bad is locality)?
Quotient Filters • Based on a technique from a homework question in Donald Knuth’s “The Art of Computer Programming: Sorting and Searching, volume 3” (Section 6. 4, exercise 13) • Quotienting Idea: Hash: 1 0 1 1 0 1 1 1 0 0 1
Quotient Filters • Based on a technique from a homework question in Donald Knuth’s “The Art of Computer Programming: Sorting and Searching, volume 3” (Section 6. 4, exercise 13) Remaining bits are discarded/lost • Quotienting Idea: Hash: 1 0 1 1 0 1 1 1 0 0 1 Quotient: q most significant bits Remainder: r least significant bits
Building a Quotient Filter • The quotient is used as an index into an m-bucket array, where the remainder is stored. • Essentially, a hashtable that stores the remainders as a value • Collisions are resolved using linear probing and 3 extra bits per bucket • is_occupied: whether a slot is the canonical slot for some value currently stored in the filter • is_continuation: whether a slot holds a remainder that is part of a run (but not the first) • is_shifted: whether a slot holds a remainder that is not in its canonical slot
Quotient Filter Example Hash table with external chaining Hash table with linear probing + bits [https: //www. usenix. org/conference/hotstorage 11/dont-thrash-how-cache-your-hash-flash] Table of objects with quotients/ remainders for reference
Quotient Filter Example [https: //www. usenix. org/conference/hotstorage 11/dont-thrash-how-cache-your-hash-flash]
Quotient Filter Concept-check • What are the possible reasons for a collision? • Which collisions are treated as “false positives” • What parameters does the QF give the user? In other words: • What knobs can you turn to control the size of the filter? • What knobs can you turn to control the false positive rate of the filter?
Why QF over BF? • Supports deletes • Supports “merges” • Good cache locality • How many locations accessed per operation? • Some math can show that runs/clusters are expected to be small • Don’t Thrash, How to Cache Your Hash on Flash also introduces the Cascade filter, a write-optimized filter made up of increasingly large QFs that spill over to disk. • Similar idea to Log-structured merge trees, which we will discuss soon!
Cascade Filter [https: //www. usenix. org/conference/hotstorage 11/dont-thrash-how-cache-your-hash-flash]
- Slides: 10