Mem C 3 Mem Cache with CLOCK and
- Slides: 23
Mem. C 3: Mem. Cache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU) Dave Andersen (CMU), Michael Kaminsky (Intel Labs) Presenter: Shouqian Shi Credit to: Son Nguyen
Memcached • DRAM-based keyvalue store to alleviate database load – Set(key, value) – Get(key) -> value
Memcached • LRU (Least Recently Used) eviction • Often used for small objects (Facebook [Atikoglu 12] – 90% keys < 31 bytes • 10 x. M queries per second (Facebook [Atikoglu 12] ) )
Memcached internal • Chaining hashtable • LRU caching using doubly linked list Global Lock Huge Space Overhead
Goals • Target: read-intensive workload with small objects • Reduce space overhead (bytes/key) • Improve throughput (queries/sec) – Result: 3 X throughput, 30% more objects
Mem. Cachd: Chaining Hashtable • Use linked list – costly space overhead for pointers • Pointer dereference is slow (no advantage from CPU cache) • Read is not constant time (due to possibly long list)
Cuckoo Hashing • Use 2 hash functions to pick two candidate positions • Each bucket has exactly 4 slots (fits in CPU cache) • Each (key, value) object therefore can reside at one of the 8 possible slots HASH 1(ka) (ka, va) HASH 2(ka)
Cuckoo Hashing X X Insert a: X HASH 1(ka) ba X X X X X c X X X (ka, va) HASH 2(ka) X X X
Cuckoo Hashing X X Insert b: X HASH 2(kb) a X X X X X bc X X X (kb, vb) HASH 1(kb) X X X
Cuckoo Hashing X X X a X Insert c: X HASH 1(kc) c X X X X b X X X (kc, vc) HASH 2(kc) X Done !!! X X
Cuckoo Hashing • Read: 4 lookups on average • Write: write(ka, va) – Find an empty slot in 8 possible slots of ka – If all are full then randomly kick some (kb, vb) out – Now find an empty slot for (kb, vb) – Repeat 500 times or until an empty slot is found – If still not found then do table expansion
Cuckoo’s advantages • Concurrency: multiple readers/single writer • Read optimized (entries fit in CPU cache) • Still O(1) amortized time for write • 30% less space overhead • 95% table occupancy
Floating Problem • Always one guy is outside during the insertion – false cache miss • Solution: Compute the kick out path (Cuckoo path) first, then move items backward
Computed Cuckoo path X X Insert a: X HASH 1(ka) b X X X X X c X X X (ka, va) HASH 2(ka) X X X
Cuckoo path backward insert X X Insert a: X HASH 1(ka) ba X X X X X c X X (ka, va) HASH 2(ka) X X X
Cuckoo and optimistic lock
Mem. Cachd: Doubly-linked-list • At least two pointers per item – expensive for small key-value pair • Both read and write update the position – change the list’s structure – need locking between threads (no concurrency)
Solution: CLOCK-based LRU • Only for multiple readers / single writer • Approximate LRU • Circular queue instead of linked list – less space overhead – 1 bit per entry vs 16 Bytes
CLOCK example Originally: entry recency entry Read(kd): Write(kf, vf): Write(kg, vg): recency entry recency (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) 0 1 0 0 1 (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) 0 1 0 (ka, va) (kb, vb) (kf, vf) 0 0 1 (kg, vg) (kb, vb) (kf, vf) 1 0 1 1 1 (kd, vd) (ke, ve) 0 0
Eviction and lock
Evaluation 68% throughput improvement in all hit case. 235% for all miss
Evaluation 3 x throughput on “real” workload
Discussion • Single machine multi-core optimization – no cooperation between machines – need a load balancer – cannot address hotspot on the cluster level • Atomic insertion – lock along the path • The impact of hotspot false eviction?
- Fast clock to slow clock synchronization
- 60 minutes in hour
- Cache rom
- Advantages of direct mapping
- Cache attacks and countermeasures: the case of aes
- Alteryx cache and run workflow
- Page fault
- Cache level 3 diploma in childcare and education
- Explain cache organization of pentium processor
- Samsun mem
- Main mem
- Excel mem
- Https://unifiedportal-mem.epfindia.gov.in/
- Phần mềm htkd vnpt
- Loada mem
- Gugel trans
- Vav hei vav
- Tcp to memsql
- Ahp
- Vẽ lều trại
- để khởi động phần mềm paint
- Basic cisco device configuration
- Afmis
- Shadow mem