Oak a Scalable OffHeap Allocated Key Value Map
Oak: a Scalable Off-Heap Allocated Key. Value Map Hagar Meir, Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Idit Keidar, Eran Meir, Gali Sheffi, Yoav Zuriel Yahoo Research
OAK ☞ Preamble Motivation Background Contribution Data Organization (Off-heap Allocated Keys) Concurrent In-memory Off-heap Key-Value Map for Big Data: ● Written in Java, but causes no JVM Garbage Collection (GC) activity ○ more performance ○ less memory https: //github. com/yahoo/Oak Open Source Library 2
Big Data goes Off-Heap Preamble ☞ Motivation Background Contribution Data Organization 3
How can you use 128 GB RAM? 1 thread, Throughput Java’s Concurrent. Skip. List. Map (CSLM) G 1 GC Preamble ☞ Motivation Background Contribution Data Organization only 60 GB of data, out of 128 GB RAM Push data (1 KB quantas) till Out. Of. Memory 4
Quickly about Off-Heap Memory Managed by JVM GC Heap Memory Direct. Byte. Buffer object Preamble Motivation ☞ Background Contribution Data Organization Zero. Copy API Not managed by JVM GC Off-Heap Memory Direct. Byte. Buffer bytes Java Process Address Space 5
Off-Heap Memory Pros and Challenges + No JVM GC costs - How to reuse memory? + No object headers - Need to (de)serialize + Quicker access - How to access it concurrently? Preamble Motivation ☞ Background Contribution Data Organization Zero. Copy API 6
OAK ⤴ Keys & Values Off-Heap Lean in memory Fast performance Off-Heap New Zero-Copy API ⤵ Rethinking ordered map API, to minimize deserialization and give user direct memory access 2 -way Scan Zero-Copy Real world deploy ⤴ Fast 2 -way Scans Rethinking how an ordered map can be traversed backward, without complicating concurrency or using additional memory ⤵ Open source library Rethinking ordered map data structure and algorithm, to minimize on-heap and maximize off-heap usage Motivation Background ☞ Contribution Data Organization Zero. Copy API Concurrency New concurrent algorithm Rethinking the internal concurrency of a map to suit new challenges, with Formal Proof! ⤵ Scales better 7
Preamble ☞ Motivation Background Contribution Data Organization How can you use 128 GB RAM? Oak Java’s Concurrent. Skip. List. Map (CSLM) Push data till Out. Of. Memory 8
Big Data Map Design Approach ● On He ap Off-Heap Memory As less metadata as possible. 9
Big Data Map Design Approach ✗ ✗ ● As less metadata as possible. ● Java objects and their headers are not efficient. Better primitives array int ✓ 10
Big Data Map Design Approach ● As less metadata as possible. ● Java objects and their headers are not efficient. Better primitives array ● Maintenance in batches: ○ preallocate off-heap ○ manage on-heap in chunks of memory ● Big values: write and copy on demand only ● Let the user access the raw data, but be on guard 11
Contribution ☞ Data Organization Zero. Copy API Concurrency Oak’s Data Organization Chunks are contiguous, ordered key ranges Keys are variable size, unique & immutable Chunk A Ordered array Binary search applied Chunk B entries: Chunk C Ephemeral On-Heap Off-Heap 12
Contribution ☞ Data Organization Zero. Copy API Concurrency Oak’s Data Organization Index Chunk B entries: Chunk A Oak. RBuffer Oak. WBuffer key→ chunk C Oak. RBuffer Ephemeral On-Heap Off-Heap Memory Arena: Key ”A” Key “C” H Value “B” H Value “D” Arena: Key ”E” Key “F” H Value “G” synch+delete 13
Contribution Data Organization ☞ Zero. Copy API Concurrency Zero-Copy API: Oak. Map<K, V> Zero. Copy. Concurrent. Navigable. Map (Legacy) Oak. RBuffer get(K) Thread 1 Thread 2 V get(K) Return reference to shared memory Return either ready object or its copy 14
Contribution Data Organization ☞ Zero. Copy API Concurrency Zero-Copy API: Oak. Map<K, V> Zero. Copy. Concurrent. Navigable. Map (Legacy) Oak. RBuffer get(K) V get(K) Set�Oak. RBuffer�key. Set() / key. Stream. Set() Set�K�key. Set() boolean put. If. Absent. Compute. If. Present(K, V, Create. Function(Oak. WBuffer), Compute. Function(Oak. WBuffer)) Thread 1 Thread 2 Return reference to shared memory Return either ready object or its copy 15
Zero. Copy API ☞ Concurrency Evaluation Druid (real-world) Oak Concurrency Separation of concerns! Index Chunk A Oak. RBuffer key→ chunk Chunk B entries: Oak. WBuffer Off-Heap Memory Arena: Key ”A” Key “C” Chunk C Data Structure Internal: Lock-Free Value “B” Value “D” Arena: Key ”E” HEADER Value “G” User Level External: Pluggable, example lockbased 16
Evaluation Concurrency ☞ Evaluation Druid (real-world) Conclusions Machine ○ AWS instance m 5 d. 16 xlarge ○ utilizing 32 cores (with hyper-threading disabled) ○ on two NUMA nodes Experiment Parameters ○ Keys size 100 B ○ Value size 1 KB ○ Limit to 32 GB (Inserting 12 GB raw data) 17
Scaling with Parallelism (11 M KV-pairs) Put Get 18
Scaling with Parallelism (11 M KV-pairs) Ascending scan , 10 K pairs/scan Descending scan, 10 K pairs/scan 19
OAK ⤴ Keys & Values Off-Heap Lean in memory Fast performance Off-Heap New Zero-Copy API Real world deploy ⤵ Rethinking ordered map API, to minimize deserialization and give user direct memory access 2 -way Scan ⤴ Zero-Copy Fast 2 -way Scans Rethinking how an ordered map can be traversed backward, without complicating concurrency or using additional memory ⤵ Open source library Rethinking ordered map data structure and algorithm, to minimize on-heap and maximize off-heap usage ✓ Motivation Background ☞ Contribution Data Organization Zero. Copy API Concurrency New concurrent algorithm Rethinking the internal concurrency of a map to suit new challenges, with Formal Proof! ⤵ ✓ Scales better 20
Oak in Apache Druid a popular open-source real-time analytics database Re-implement Druid’s centerpiece Incremental Index (I 2) component around Oak. Incremental. Index Decreasing memory consumption Faster Ingestions 21
Druid Ingestion - Scaling with RAM Ingesting 7 M tuples Tuple size 1. 25 KB RAM scaling 25 GB to 32 GB 22
Druid Ingestion - RAM overhead 23
OAK ⤴ Keys & Values Off-Heap Lean in memory ✓ Fast performance Off-Heap ✓ New Zero-Copy API Real world deploy Rethinking ordered map API, to minimize deserialization and give user direct memory access 2 -way Scan ⤴ Zero-Copy ⤵ Fast 2 -way Scans Rethinking how an ordered map can be traversed backward, without complicating concurrency or using additional memory ⤵ Open source library Rethinking ordered map data structure and algorithm, to minimize on-heap and maximize off-heap usage ✓ Motivation Background ☞ Contribution Data Organization Zero. Copy API Concurrency ✓ Concurrency New concurrent algorithm Rethinking the internal concurrency of a map to suit new challenges, with Formal Proof! ⤵ ✓ Scales better 24
Oak: a concurrent ordered KV-map with. . . Questions? 2. 3. 4. 5. 6. First off-heap managed memory data structure ○ off-heap data vs on-heap metadata ○ managed programming experience Novel Zero-Copy API ○ minimize deserialization Novel Concurrent Algorithm ○ conditional and unconditional update-in-place ○ fast 2 -ways scans Fast && Lean compared to CSLM ○ 2. 5% metadata ○ up to x 2 faster than CSLM Real world application ○ Druid anastas@verizonmedia. com Fast performance ⤴ Lean in memory ⤵ 1. Concurrency Evaluation Druid (real-world) ☞ Conclusions Off-Heap 2 -way Scan Open Source Library: https: //github. com/yahoo/Oak Zero-Copy Concurrency ⤵ Open source library ⤴ Real world deploy ⤵ Scales better 25
Thank you! 26
- Slides: 26