MICRO 51 Persistence Parallelism Optimization: A Holistic Approach from Memory Bus to RDMA Network Xing Hu∗ Matheus Ogleari† Jishen Zhao‡ Shuangchen Li∗ Abanti Basak∗ Yuan Xie∗ University of California, Santa Barbara∗ University of California, Santa Cruz† University of California, San Diego‡ Presenter: Seunghyo Kang
Persistent memory • Data remained after power off • To keep persistent & to survive from crash? • Hardware Ordering Control + multi-versioning • !!! Inefficient along the persist data paths !!! • Solve with parallelism a|b
Three Datapaths RDMA Overhead (90% of time stalled by network) Bank Conflict Overhead (36% of req. are stalled by this)
Motivation – In Local Memory Bus Epoch Barrier
Motivation – In Local Memory Bus Bank-Level Parallelism Barrier
Motivation – In Remote Memory
Architectural Design ① St X = x ② St X = y Dependency! Barrier Region Of Interest (BROI)
Architectural Design Epoch
Architectural Design
Architectural Design – Overhead (≈ 400 B)
System Design RDMA_pwrite RDMA_write with ordering ctrl Advanced RDMA NIC Enable ACK
Local Application Performance Local: 16% Hybrid(Local + RDMA): 18%
Remote Application Performance x 2 improved
Conclusion • Ordering Ctrl with Intra-, Inter-thread parallelism in Local and Remote Node • Throughput Improved in Local and Remote Persistent Memory Environment