SILT A MemoryEfficient HighPerformance KeyValue Store Hyeontaek Lim

  • Slides: 25
Download presentation
SILT: A Memory-Efficient, High-Performance Key-Value Store Hyeontaek Lim, Bin Fan, David G. Andersen Michael

SILT: A Memory-Efficient, High-Performance Key-Value Store Hyeontaek Lim, Bin Fan, David G. Andersen Michael Kaminsky† Carnegie Mellon University †Intel Labs 2011 -10 -24

Key-Value Store Cluster Clients PUT(key, value) value = GET(key) DELETE(key) • • E-commerce (Amazon)

Key-Value Store Cluster Clients PUT(key, value) value = GET(key) DELETE(key) • • E-commerce (Amazon) Web server acceleration (Memcached) Data deduplication indexes Photo storage (Facebook) 2

 • Many projects have examined flash memorybased key-value stores – Faster than disk,

• Many projects have examined flash memorybased key-value stores – Faster than disk, cheaper than DRAM • This talk will introduce SILT, which uses drastically less memory than previous systems while retaining high performance. 3

Flash Must be Used Carefully Random reads / sec 48, 000 Fast, but not

Flash Must be Used Carefully Random reads / sec 48, 000 Fast, but not THAT fast $ / GB 1. 83 Space is precious Another long-standing problem: random writes are slow and bad for flash life (wearout) 4

DRAM Must be Used Efficiently DRAM used for index (locate) items on flash 1

DRAM Must be Used Efficiently DRAM used for index (locate) items on flash 1 TB of data to store on flash 4 bytes of DRAM for key-value pair (previous state-of-the-art) 32 B: Data deduplication => 125 GB! 100 Index size (GB) 168 B: Tweet => 24 GB 10 1 KB: Small image => 4 GB 1 10 1000 Key-value pair size (bytes) 10000 5

Three Metrics to Minimize Memory overhead = Index size per entry • Ideally 0

Three Metrics to Minimize Memory overhead = Index size per entry • Ideally 0 (no memory overhead) Read amplification = Flash reads per query • Limits query throughput • Ideally 1 (no wasted flash reads) Write amplification = Flash writes per entry • Limits insert throughput • Also reduces flash life expectancy • Must be small enough for flash to last a few years 6

Landscape: Where We Were Read amplification Skimpy. Stash Hash. Cache Buffer. Hash ? Flash.

Landscape: Where We Were Read amplification Skimpy. Stash Hash. Cache Buffer. Hash ? Flash. Store FAWN-DS Memory overhead (bytes/entry) 7

Seesaw Game? Skimpy. Stash Memory efficiency How can we improve? FAWN-DS Flash. Store Hash.

Seesaw Game? Skimpy. Stash Memory efficiency How can we improve? FAWN-DS Flash. Store Hash. Cache Buffer. Hash High performance 8

Solution Preview: (1) Three Stores with (2) New Index Data Structures Queries look up

Solution Preview: (1) Three Stores with (2) New Index Data Structures Queries look up stores in sequence (from new to old) Inserts only go to Log Data are moved in background SILT Sorted Index (Memory efficient) SILT Filter SILT Log Index (Write friendly) Memory Flash 9

Log. Store: No Control over Data Layout Naive Hashtable (48+ B/entry) SILT Log Index

Log. Store: No Control over Data Layout Naive Hashtable (48+ B/entry) SILT Log Index (6. 5+ B/entry) Still need pointers: size ≥ log N bits/entry Memory Flash Inserted entries are appended (Older) On-flash log Memory overhead 6. 5+ bytes/entry (Newer) Write amplification 1 10

Sorted. Store: Space-Optimized Layout SILT Sorted Index (0. 4 B/entry) Memory Flash Need to

Sorted. Store: Space-Optimized Layout SILT Sorted Index (0. 4 B/entry) Memory Flash Need to perform bulkinsert to amortize cost On-flash sorted array Memory overhead 0. 4 bytes/entry Write amplification High 11

Combining Sorted. Store and Log. Store <Sorted. Store> <Log. Store> SILT Sorted Index SILT

Combining Sorted. Store and Log. Store <Sorted. Store> <Log. Store> SILT Sorted Index SILT Log Index Merge On-flash sorted array On-flash log 12

Achieving both Low Memory Overhead and Low Write Amplification Sorted. Store • Low memory

Achieving both Low Memory Overhead and Low Write Amplification Sorted. Store • Low memory overhead • High write amplification Log. Store • High memory overhead • Low write amplification Sorted. Store Log. Store Now we can achieve simultaneously: Write amplification = 5. 4 = 3 year flash life Memory overhead = 1. 3 B/entry With “Hash. Stores”, memory overhead = 0. 7 B/entry! (see paper) 13

SILT’s Design (Recap) <Sorted. Store> <Hash. Store> <Log. Store> SILT Sorted Index SILT Filter

SILT’s Design (Recap) <Sorted. Store> <Hash. Store> <Log. Store> SILT Sorted Index SILT Filter SILT Log Index Merge On-flash sorted array Memory overhead 0. 7 bytes/entry Conversion On-flash hashtables Read amplification 1. 01 On-flash log Write amplification 5. 4 14

Review on New Index Data Structures in SILT Sorted Index SILT Filter & Log

Review on New Index Data Structures in SILT Sorted Index SILT Filter & Log Index Entropy-coded tries Partial-key cuckoo hashing For Sorted. Store Highly compressed (0. 4 B/entry) For Hash. Store & Log. Store Compact (2. 2 & 6. 5 B/entry) Very fast (> 1. 8 M lookups/sec) 15

Compression in Entropy-Coded Tries 0 0 0 1 1 1 1 0 0 0

Compression in Entropy-Coded Tries 0 0 0 1 1 1 1 0 0 0 1 1 Hashed keys (bits are random) # red (or blue) leaves ~ Binomial(# all leaves, 0. 5) Entropy coding (Huffman coding and more) (More details of the new indexing schemes in paper) 16

Landscape: Where We Are Read amplification Skimpy. Stash Hash. Cache Buffer. Hash SILT Flash.

Landscape: Where We Are Read amplification Skimpy. Stash Hash. Cache Buffer. Hash SILT Flash. Store FAWN-DS Memory overhead (bytes/entry) 17

Evaluation 1. Various combinations of indexing schemes 2. Background operations (merge/conversion) 3. Query latency

Evaluation 1. Various combinations of indexing schemes 2. Background operations (merge/conversion) 3. Query latency Experiment Setup CPU Flash drive Workload size Query pattern 2. 80 GHz (4 cores) SATA 256 GB (48 K random 1024 -byte reads/sec) 20 -byte key, 1000 -byte value, ≥ 50 M keys Uniformly distributed (worst for SILT) 18

Log. Store Alone: Too Much Memory Workload: 90% GET (50 -100 M keys) +

Log. Store Alone: Too Much Memory Workload: 90% GET (50 -100 M keys) + 10% PUT (50 M keys) 19

Log. Store+Sorted. Store: Still Much Memory Workload: 90% GET (50 -100 M keys) +

Log. Store+Sorted. Store: Still Much Memory Workload: 90% GET (50 -100 M keys) + 10% PUT (50 M keys) 20

Full SILT: Very Memory Efficient Workload: 90% GET (50 -100 M keys) + 10%

Full SILT: Very Memory Efficient Workload: 90% GET (50 -100 M keys) + 10% PUT (50 M keys) 21

Small Impact from Background Operations Workload: 90% GET (100~ M keys) + 10% PUT

Small Impact from Background Operations Workload: 90% GET (100~ M keys) + 10% PUT 40 K Oops! bursty TRIM by ext 4 FS 33 K 22

Low Query Latency Best tput @ 16 threads Workload: 100% GET (100 M keys)

Low Query Latency Best tput @ 16 threads Workload: 100% GET (100 M keys) Median = 330 μs 99. 9 = 1510 μs # of I/O threads 23

Conclusion • SILT provides both memory-efficient and high-performance key-value store – Multi-store approach –

Conclusion • SILT provides both memory-efficient and high-performance key-value store – Multi-store approach – Entropy-coded tries – Partial-key cuckoo hashing • Full source code is available – https: //github. com/silt 24

Thanks! 25

Thanks! 25