HistoryIndependent Cuckoo Hashing Moni Naor Gil Segev Weizmann

  • Slides: 19
Download presentation
History-Independent Cuckoo Hashing Moni Naor Gil Segev Weizmann Institute Israel Udi Wieder Microsoft Research

History-Independent Cuckoo Hashing Moni Naor Gil Segev Weizmann Institute Israel Udi Wieder Microsoft Research Silicon Valley

Election Day n n n Elections for class president Each student whispers in Mr.

Election Day n n n Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes Carol Alice Bob n Carol Alice Bob Problem: Mr. Drew’s notebook leaks sensitive information n First student voted for Carol May compromise the n Second student voted for Alice privacy of the elections n … 2

Election Day n What about more involved applications? n Write-in candidates n Votes which

Election Day n What about more involved applications? n Write-in candidates n Votes which are subsets or rankings n …. Alice 11 Bob 1 Carol Alice n Alice Bob A simple solution: n Lexicographically sorted list of candidates n Unary counters 3

Learning From History n The two levels of a data structure n n n

Learning From History n The two levels of a data structure n n n “Legitimate” interface Memory representation History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface n A simple example: sorted list n n Canonical memory representation Not really efficient. . . Alice Bob Carol 4

Typical Applications n Incremental cryptography[BGG 94, Mic 97] n Voting [MKSW 06, MNS 07]

Typical Applications n Incremental cryptography[BGG 94, Mic 97] n Voting [MKSW 06, MNS 07] n Set comparison & reconciliation[MNS 08] n Computational geometry[BGV 08] n . . . 5

Our Contribution A HI dictionary that simultaneously achieves the following: n Efficiency: n Lookup

Our Contribution A HI dictionary that simultaneously achieves the following: n Efficiency: n Lookup time – O(1) worst case n Update time – O(1) expected amortized n Memory utilization 50% (25% with deletions) n Strongest notion of history independence n Simple and fast 6

Notions of History Independence Naor and Teague (2001) following Macciancio (1997) n Weak history

Notions of History Independence Naor and Teague (2001) following Macciancio (1997) n Weak history independence n n n Memory revealed at the end of an activity period Any two sequences of operations S 1 and S 2 that lead to the same content induce the same distribution on the memory representation Strong history independence n n n Memory revealed several times during an activity period Any two sets of breakpoints along S 1 and S 2 with the same content at each breakpoint, induce the same distributions on the memory representation at all these points Completely randomizing memory after each operation is not good enough 8

Notions of History Independence n We consider strong history independence n n n Canonical

Notions of History Independence n We consider strong history independence n n n Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures [HHMPR 05] Weak & strong are not equivalent n n WHI for reversible data structures is possible without a canonical representation Provable efficiency gaps [BP 06] (in restricted models) 9

SHI Dictionaries Memory utilization Update time Lookup time Naor & Teague ‘ 01 99%

SHI Dictionaries Memory utilization Update time Lookup time Naor & Teague ‘ 01 99% O(1) expected O(1) worst case Blelloch & Golovin ‘ 07 99% O(1) expected Blelloch & Golovin ‘ 07 < 9% O(1) expected O(1) worst case This work < 25% (< 50%) O(1) expected O(1) worst case Deletions Practical? (mem. util. < 50%) ? 10

Our Approach n n Cuckoo hashing [PR 01]: A simple & practical scheme with

Our Approach n n Cuckoo hashing [PR 01]: A simple & practical scheme with worst case constant lookup time Force a canonical representation on cuckoo hashing n n No significant loss in efficiency Avoid rehashing by using a smallstash n n What happens when hash functions fail? Rehashing is problematic in SHI data structures n n n All hash functions need to be sampled in advance (theoretical problem) When an item is deleted, may need to roll back on previous functions We use a secondary storage to reduces the failure probability exponentially [KMW 08] 11

Cuckoo Hashing n n Tables T 1 and T 2 with hash functions h

Cuckoo Hashing n n Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1[h 1(x)] and T 2[h 2(x)] Insert(x): n Greedily insert in T 1 or T 2 n If both are occupied then store x in T 1 n Repeat in other table with the previous occupant T 1 T 2 V V Z T 2 Successful insertion Z Y X Y W X W 12

Cuckoo Hashing n n Tables T 1 and T 2 with hash functions h

Cuckoo Hashing n n Tables T 1 and T 2 with hash functions h 1 and h 2 Store x in one of T 1[h 1(x)] and T 2[h 2(x)] Insert(x): n Greedily insert in T 1 or T 2 n If both are occupied then store x in T 1 n Repeat in other table with the previous occupant T 1 T 2 V U Z Failure – rehash required Y X 13

The Cuckoo Graph n n Set S ½ U containing n keys h 1,

The Cuckoo Graph n n Set S ½ U containing n keys h 1, h 2 : U ! {1, . . . , r} S is successfully stored Every connected component has at most one cycle Main theorem: If r ¸ (1 + ²)n and h 1, h 2 are log(n) -wise independent, then failure probability is £(1/n) Bipartite graph with sets of size r Edge (h 1(x), h 2(x)) for every x 2 S 14

The Canonical Representation n n Assume that S can be stored using h 1

The Canonical Representation n n Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph n n n Suffices to consider a single connected component Assume that S forms a tree in the cuckoo graph. Typical case One location must be empty. The choice of the empty location uniquely determines the location of all elements Rule: h 1 (minimal element) is empty a b c d e 15

The Canonical Representation n n Assume that S can be stored using h 1

The Canonical Representation n n Assume that S can be stored using h 1 and h 2 We force a canonical representation on the cuckoo graph n n Suffices to consider a single connected component Assume that S has one cycle Two ways to assign elements in the cycle Each choice uniquely determines the location of all elements Rule: minimal element in cycle lies in T 1 a b c d e 16

The Canonical Representation n n Updates efficiently maintain the canonical representation Insertions: n n

The Canonical Representation n n Updates efficiently maintain the canonical representation Insertions: n n Deletions: n n New leaf: check if new element is smaller than current min new cycle: n Same component… n Merging two components… All cases straight forward Find the new min, split component, … Requires connecting all elements in the component with a sorted cyclic list n Memory utilization drops to 25% All cases straight forward Update time < size of component = expected (small) constant 17

Rehashing n What if S cannot be stored usingh 1 and h 2 ?

Rehashing n What if S cannot be stored usingh 1 and h 2 ? n n Happens with probability £(1/n) Can we simply pick new functions? n n Rear, but very bad worst case performance Canonical memory implies we need to sample all hash functions in advance (theoretical problem) Whenever an item is deleted, need to check whether we must role back to previous hash functions A bad item which is repeatedly inserted and deleted would cause a rehash every operation! 18

Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary

Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary data structure n Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself n n Theorem [KMW 08]: Pr[|stash| > s] < n-s In practice keeping the stash as a sorted list is probably the best solution n n Effectively the query time is constant with (very) high probability In theory the stash could be any SHI with constant lookup time n A deterministic hashing scheme, where the elements are rehashed whenever the content changes [AN 96, HMP 01] 19

Conclusions and Problems n Cuckoo hashing is a robust and flexible hashing scheme n

Conclusions and Problems n Cuckoo hashing is a robust and flexible hashing scheme n n We don’t know how to do this for CH with more than 2 hash functions and/or more than 1 element per bucket n n n Easily ‘molded’ into a history independent data structure Better memory utilization, better performance, but. . Expected size of connected component is not constant Full performance analysis 20