Scalable and Energyefficient Architecture Lab SEAL A Unified
Scalable and Energy-efficient Architecture Lab (SEAL) A Unified Memory Network Architecture for In-Memory Computing in Commodity Servers Jia Zhan 1, 2, , Itir Akgun 2, Jishen Zhao 3, Al Davis 4, Paolo Faraboschi 4, Yuangang Wang 5, Yuan Xie 2 1 Uber, 2 University of California Santa Barbara, 3 University of California Santa Cruz, 4 HP Labs, 5 Huawei Scalable and Energy-efficient Architecture Lab (SEAL) http: //seal. ece. ucsb. edu/ SEAL@UCSB
Scalable and Energy-efficient Architecture Lab (SEAL) Trends Application In-memory computing e. g. In-memory databases Architecture Network of memory cubes Devices / Interfaces 3 D stacked memory e. g. Hybrid Memory Cube (HMC) 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 2
Scalable and Energy-efficient Architecture Lab (SEAL) Application: In-memory computing v Keeps an application’s working dataset in memory for faster access, potentially obviating the need to page data to/from disks > needs large memory capacity v Widely adopted in real-time analytics applications v Database • Strives to fit the entire database in memory • e. g. SAP HANA, Volt. DB, IBM DB 2 BLU v Computing/Storage • Persists intermediate data in memory • e. g. Spark, Alluxio (Tachyon) 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 3
Scalable and Energy-efficient Architecture Lab (SEAL) Application: In-memory computing v Evaluation of word-count jobs on a 10 GB Wikipedia dataset, using Map. Reduce and Spark frameworks 5 x 9 x v v Spark achieves 5 x faster execution time, but with 9 x more memory accesses 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 4
Scalable and Energy-efficient Architecture Lab (SEAL) Architecture: Processor-centric network v Remote memory accesses: frequent, high latency v Local access latency : remote access latency = 1 : 1. 97 v Processor-centric architecture is not ideal for in-memory computing application needs 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 5
Scalable and Energy-efficient Architecture Lab (SEAL) Architecture: Scalable memory-centric network v A memory pool shared by all sockets is 1) scalable, 2) can provide on-demand memory allocation v How to connect the memory nodes to achieve low latency, high bandwidth, and low power consumption? 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 6
Scalable and Energy-efficient Architecture Lab (SEAL) Memory scaling: 3 D integration with HMC Hybrid Memory Cube 11/2/2020 Intra-memory network in the logic base The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 7
Scalable and Energy-efficient Architecture Lab (SEAL) Two types of networks v Intra-memory network • Connects MCs inside a memory node (HMC) v Inter-memory network • Connects different memory nodes v Question: How should we coordinate them? 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 8
Scalable and Energy-efficient Architecture Lab (SEAL) Unifying the memory networks Bottleneck! Decoupled network 11/2/2020 Unified network The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 9
Scalable and Energy-efficient Architecture Lab (SEAL) Overview of network architecture v Intra-memory optimization • Topology, smart I/O placement v Inter-memory optimization • Distance-aware compression, power-gating 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 10
Scalable and Energy-efficient Architecture Lab (SEAL) Intra-memory network optimization: Topology Mesh v Replace crossbar with an interconnection network 11/2/2020 Reduction/dispersion tree v No MC-to-MC communication > reduction/dispersion tree The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 11
Scalable and Energy-efficient Architecture Lab (SEAL) Intra-memory network optimization: Smart I/O Placement Reduction/dispersion tree v Align ports on the edge of the network T L v Local traffic routes to the internal memory v Through traffic bypasses the memory using the routers on the edge T L 11/2/2020 Win Eout The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 12
Scalable and Energy-efficient Architecture Lab (SEAL) Inter-memory network optimization: Distance-aware compression v To deal with intensive memory accesses 1. Distance-aware data compression 2. Decoupled encoder/decoder placement compressed packet uncompressed (original) packet 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 13
Scalable and Energy-efficient Architecture Lab (SEAL) Inter-memory network optimization: Distance-aware compression v 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 14
Scalable and Energy-efficient Architecture Lab (SEAL) Inter-memory network optimization: Power-gating v Memory system power breakdown • Off-chip links become the primary consumer • Memory network power needs to be addressed 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 15
Scalable and Energy-efficient Architecture Lab (SEAL) Inter-memory network optimization: Power-gating v Power management Requirements: • • • Connected graph Deadlock-free Low latency/energy overhead v Motivation: no direct memory to memory communication v Buffer utilization: 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 16
Scalable and Energy-efficient Architecture Lab (SEAL) Experimental setup v Simulator: • In-house event-driven network simulator (C++, ~10, 000 lines) • CPU and memory nodes abstracted as active and passive terminals • Trace-driven (Pin) v 4 CPU sockets, 16 memory nodes, 4 GB per node v Memory channel: 256 lanes, 10 Gbps per lane v Synthetic traffic • Uniform-random, hotspot, local-remote-ratio v Real workloads • Spark (word-count, grep, sort), Redis, memcached, pagerank 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 17
Scalable and Energy-efficient Architecture Lab (SEAL) Experimental results: Synthetic traffic 11/2/2020 • PCN: processor-centric design • MCN: simple memorycentric design • d. MCN: memory-centric design with more direct CPU to memory links • u. Memnet: our unified network design The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 18
Scalable and Energy-efficient Architecture Lab (SEAL) Experimental results: In-memory computing workloads v Mean memory access latency study. Lower is better. v By sweeping the value of CPI, we can control memory intensity: 60 (least intensive) – 30 (most intensive) v On average, 75. 1% memory access latency reduction 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 19
Scalable and Energy-efficient Architecture Lab (SEAL) Experimental results: Energy evaluation v Total memory system energy consumption study, normalized to d. MCN. Lower is better. v On average, 22. 1% memory energy reduction 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 20
Scalable and Energy-efficient Architecture Lab (SEAL) Summary v In-memory computing applications stress the main memory system v A memory pool can be designed using 3 D stacked memories like HMC v We propose a unified memory network (u. Memnet) • Intra-memory network optimization (topology, I/O placement) • Inter-memory network optimization (distance-aware compression, power-gating) v Overall, 75. 1% memory access latency reduction, and 22. 1% memory energy reduction. 11/2/2020 The 49 th Annual IEEE/ACM International Symposium on Microarchitecture 21
Scalable and Energy-efficient Architecture Lab (SEAL) A Unified Memory Network Architecture for In-Memory Computing in Commodity Servers Jia Zhan 1, 2, , Itir Akgun 2, Jishen Zhao 3, Al Davis 4, Paolo Faraboschi 4, Yuangang Wang 5, Yuan Xie 2 1 Uber, 2 University of California Santa Barbara, 3 University of California Santa Cruz, 4 HP Labs, 5 Huawei THANK YOU! Scalable and Energy-efficient Architecture Lab (SEAL) http: //seal. ece. ucsb. edu/ SEAL@UCSB
- Slides: 22