HAMT Hash Array Mapped Trie Ideal Hash Trees

  • Slides: 11
Download presentation
HAMT Hash Array Mapped Trie Ideal Hash Trees (Bagwell, 2001)

HAMT Hash Array Mapped Trie Ideal Hash Trees (Bagwell, 2001)

hash+ key 3 0 x 3 b obj 3 key 1 0 x 3

hash+ key 3 0 x 3 b obj 3 key 1 0 x 3 b obj 1 key 0 0 x 0 e obj 0 key 2 0 x 1 a obj 2 key 4 0 x 22 obj 4 … … … (hash-set hash key 4 obj 4) hash+

Persistent hashes • Instead, we want a immutable linked structure that can be partially

Persistent hashes • Instead, we want a immutable linked structure that can be partially updated while portions of the old hash persist. • Hash Array Mapped Trie (HAMT); From “Ideal hash trees” (Bagwell, 2001) • Traditional balanced trees (AVL, RB, B-trees, etc) use mutability to dynamically re-balance when necessary. • HAMT first hashes its keys to obtain good dispersion characteristics, as a Hash Table would, but then uses a linked trie (prefix tree) structure.

0 NODE . . . NODE 1 0 NODE 0 1 NODE 1

0 NODE . . . NODE 1 0 NODE 0 1 NODE 1

0 NODE . . . NODE 1 0 NODE 0 PAIR 1 Key* Value*

0 NODE . . . NODE 1 0 NODE 0 PAIR 1 Key* Value* . . . (If this subtree has no other key/value pair) NODE 1 NODE . . .

Increase the branching factor. Use the value 1 (can’t be an aligned pointer) to

Increase the branching factor. Use the value 1 (can’t be an aligned pointer) to tag non-terminal nodes. 0 0 0 0 1 Key* … 0 0 0 Key* Value* 0 0 … … 0 0 Key* Value* … … 0 0 1 … … 0 0 Value* … 0

These 1 s indicate the 2 nd and 3 rd indices were non-null and

These 1 s indicate the 2 nd and 3 rd indices were non-null and either contain a Key/Value or a pointer to an inner node. 0000001100 1 1 Then, we can compress these nodes using a bitmap.

__builtin_popcount(u 32) __builtin_popcountll(u 64) u 32 popcount(u 32 x) { x = x -

__builtin_popcount(u 32) __builtin_popcountll(u 64) u 32 popcount(u 32 x) { x = x - ((x >> 1) & 0 x 5555); x = (x & 0 x 3333) + ((x >> 2) & 0 x 3333); x = (x + (x >> 4)) & 0 x 0 F 0 F; x = x + (x >> 8); x = x + (x >> 16); return x & 0 x 0000003 F; } Technique is called SIMD within a register (SWAR). Faster population counts using AVX 2 Instructions (Mula, et al. 2017)

length of the bitmap __builtin_popcountll(bm << (max - hindex)) Returns the number of 1

length of the bitmap __builtin_popcountll(bm << (max - hindex)) Returns the number of 1 s preceding the hash-index we want, which is the same as the compressed index.

So this trie would now compress to…. 0 0 0 0 1 Key* …

So this trie would now compress to…. 0 0 0 0 1 Key* … 0 0 0 Key* Value* 0 0 … … 0 0 Key* Value* 0 0 1 … … 0 0 Value* … 0 1 1 Key* Value* … …

This trie, with 31 or 63 bit bitmaps in the parent node. 0 0

This trie, with 31 or 63 bit bitmaps in the parent node. 0 0 Key* Value* … 0000101 … 0111011 … 0011001 Key* Value* … … 0 0 BM | 1 Key* Value*