Scalable and Energyefficient Architecture Lab SEAL Scalable Memory
Scalable and Energy-efficient Architecture Lab (SEAL) Scalable Memory Fabric for Silicon Interposer-Based Multi-Core Systems Itir Akgun*, Jia Zhan*, Yuangang Wang†, and Yuan Xie* *University of California, Santa Barbara †Huawei
Scalable and Energy-efficient Architecture Lab (SEAL) One-Page Summary v Aim: To design a scalable hybrid memory fabric for silicon-interposer based 2. 5 D design to enable memory -intensive applications such as in-memory computing by providing a low-latency and high-bandwidth processor-memory communication v Design: 1) Topology, 2) technology, and 3) routing algorithm design for the hybrid memory network on silicon interposer v Conclusion: Proposed memory network in silicon interposer (Mem. Ni. SI) design can provide a scalable fabric 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 2
Scalable and Energy-efficient Architecture Lab (SEAL) Agenda v Background and Motivation v Architecture v Network Design v Evaluations v Conclusion 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 3
Scalable and Energy-efficient Architecture Lab (SEAL) Background: Silicon interposer in 2. 5 D v Interposer: Silicon die with metal layers that allows facedown integration of chips via micro-bumps v Passive vs. active interposers https: //upload. wikimedia. org/wikipedia/commons/ e/e 7/AMD_Fiji_GPU_package_with_GPU, _HBM_ memory_and_interposer. jpg 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 4
Scalable and Energy-efficient Architecture Lab (SEAL) Background: In-memory computing v Keeps an application’s working dataset in memory for faster access -> needs large memory capacity v Target applications: • Big data processing • Real-time analytics • In-memory databases 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 5
Scalable and Energy-efficient Architecture Lab (SEAL) Motivation v Requirements for memory-intensive applications • Memory capacity • Bandwidth • Performance v Scalable solutions for the memory requirements • 3 D stacking (+) memory capacity, bandwidth, power consumption, performance (-) thermals, scalability • 2. 5 D stacking (+) all the benefits of 3 D stacking, thermals, scalability 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 6
Scalable and Energy-efficient Architecture Lab (SEAL) Agenda v Background and Motivation v Architecture v Network Design v Evaluations v Conclusion 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 7
Scalable and Energy-efficient Architecture Lab (SEAL) Architecture 3 D DRAM CMP Interposer v 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 8
Scalable and Energy-efficient Architecture Lab (SEAL) Agenda v Background and Motivation v Architecture v Network Design v Evaluations v Conclusion 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 9
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Topology “Pillar” nodes 2/21/2021 Point-to-point Daisy chain (+) Fewer hops to memory (-) Poor bandwidth (-) Poor scalability (-) More hops to memory (+) Improved bandwidth (-) Poor scalability ~ The 34 th IEEE International Conference on Computer Design ~ 10
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Topology Mem. Ni. SI: Memory Network in Silicon Interposer (+) Fewer hops to memory (+) Improved bandwidth (+) Improved scalability Logical layout 2/21/2021 Physical layout ~ The 34 th IEEE International Conference on Computer Design ~ 11
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Technology v Active vs. passive interposers v Process technology v A combination of these factors may result in network frequency discrepancies 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 12
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Routing Algorithm v Baseline Topologies Pillar router first (PRF): Modified XY-routing Request Path Response Path 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 13
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Routing Algorithm v Memory Network in Silicon Interposer (Mem. Ni. SI) 1) Network in Silicon Interposer Heavy (Ni. SIH) Request and Response Path 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 14
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Routing Algorithm v Memory Network in Silicon Interposer (Mem. Ni. SI) 2) Network-on-Chip Heavy (No. CH) Request and Response Path 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 15
Scalable and Energy-efficient Architecture Lab (SEAL) Network Design: Routing Algorithm v Memory Network in Silicon Interposer (Mem. Ni. SI) 3) Choose-Faster-Path (CFP) 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 16
Scalable and Energy-efficient Architecture Lab (SEAL) Agenda v Background and Motivation v Architecture v Network Design v Evaluations v Conclusion 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 17
Scalable and Energy-efficient Architecture Lab (SEAL) Simulator Setup v Event-driven simulator in C++ to model hybrid network CPU & Memory On-chip Network Interposer Network • • • • • 2/21/2021 16 cores 2 GHz 64 B cache line size 16 memory nodes 4 GB/node 100 cycles memory latency 4 -stage router pipeline 4 VCs/port 4 buffers/VC 16 B flit size 4 flits maximum packet size 4 x 4 mesh topology 4 -stage router pipeline 4 VCs/port 4 buffers/VC 16 B flit size 4 flits maximum packet size ~ The 34 th IEEE International Conference on Computer Design ~ 18
Scalable and Energy-efficient Architecture Lab (SEAL) Workloads Setup v Synthetic traffic: • Uniform-random • Hotspot v In-memory computing workloads: • Cloud. Suite: Pagerank, Tunkrank, Spark-grep, Memcached • Big. Data. Bench: Spark-sort • Redis 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 19
Scalable and Energy-efficient Architecture Lab (SEAL) Topology Study v Average packet latency comparison of the network topologies under uniform-random traffic Mean Packet Latency (ns) v Mem. Ni. SI is more scalable under heavy memory traffic 300 250 200 150 100 50 0 0. 1 0. 2 0. 3 Point-to-point 2/21/2021 0. 4 0. 5 Injection Rate Daisy chain 0. 6 0. 7 0. 8 Mem. Ni. SI ~ The 34 th IEEE International Conference on Computer Design ~ 20
Scalable and Energy-efficient Architecture Lab (SEAL) Topology Study v Average packet latency comparison of the network topologies under hotspot traffic Mean Packet Latency (ns) v On average, Mem. Ni. SI performs 8. 9% faster than point-to-point and 15. 3% faster than daisy chain topologies under hotspot 300 250 200 150 100 50 0 Hotspot_20 Hotspot_40 Point-to-point 2/21/2021 Hotspot_60 Daisy chain Hotspot_80 Average Mem. Ni. SI ~ The 34 th IEEE International Conference on Computer Design ~ 21
Scalable and Energy-efficient Architecture Lab (SEAL) Sensitivity Analysis v Network frequency sensitivity analysis for CFP algorithm v Faster No. C 4: 1 ; Faster Ni. SI 1: 4 Algorithm Choice (%) 70 60 50 40 30 20 100: 1 20: 1 4: 1 2: 1 1: 2 1: 4 1: 20 1: 100 Network Frequency Ratio (No. C: Ni. SI) No. CH % 2/21/2021 Ni. SIH % ~ The 34 th IEEE International Conference on Computer Design ~ 22
Scalable and Energy-efficient Architecture Lab (SEAL) Routing Algorithm Study v Routing algorithm comparison under synthetic traffic, normalized to Ni. SIH algorithm Normalized Mean Packet Latency v CFP can outperform up to 6. 85% Uniform-random Hotspot Average 1. 1 1. 05 1 0. 95 0. 9 Ni. SIH No. CH 1: 1 CFP Ni. SIH No. CH CFP 1: 4 Ni. SIH No. CH CFP 4: 1 Network Frequency Ratio (No. C: Ni. SI) 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 23
Scalable and Energy-efficient Architecture Lab (SEAL) Routing Algorithm Study v Routing algorithm comparison under in-memory computing workloads, normalized to Ni. SIH algorithm Normalized Mean Packet Latency v On average, CFP outperforms Ni. SIH by up to 3. 4% and No. CH by up to 10. 0%. For 1: 1, Ni. SIH by 1. 65% and No. CH by 7. 99%. Ni. SIH No. CH CFP 1. 15 1. 1 1. 05 1 0. 95 0. 9 1: 1 1: 4 4: 1 1: 1 1: 4 4: 1 pagerank 2/21/2021 tunkrank spark-grep spark-sort memcached redis ~ The 34 th IEEE International Conference on Computer Design ~ Average 24
Scalable and Energy-efficient Architecture Lab (SEAL) Agenda v Background and Motivation v Architecture v Network Design v Evaluations v Conclusion 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 25
Scalable and Energy-efficient Architecture Lab (SEAL) Conclusion v Verdict: A memory network approach can provide a scalable fabric for low-latency and high-bandwidth communication for interposer-based 2. 5 D designs v Future work: Evaluating power/area/cost/reliability of the memory network in silicon interposer 2/21/2021 ~ The 34 th IEEE International Conference on Computer Design ~ 26
Scalable and Energy-efficient Architecture Lab (SEAL) Scalable Memory Fabric for Silicon Interposer-Based Multi-Core Systems Itir Akgun*, Jia Zhan*, Yuangang Wang†, and Yuan Xie* *University of California, Santa Barbara †Huawei THANK YOU!
- Slides: 27