CircuitSwitched Coherence Natalie Enright Jerger LiShiuan Peh Mikko
- Slides: 31
Circuit-Switched Coherence Natalie Enright Jerger*, Li-Shiuan Peh+, Mikko Lipasti* *University of Wisconsin - Madison +Princeton University 2 nd IEEE International Symposium on Networks-on-Chip
Motivation n Network on Chip for general purpose multi-core n n n Router latency overhead can be significant n n 6/11/2021 Replacing dedicated global wires Efficient/scalable communication on-chip Exploit application characteristics to lower latency Co-design coherence protocol to match network functionality Natalie Enright Jerger - University of Wisconsin 2
Executive Summary n Hybrid Network n n n Co-design cache coherence protocol n 6/11/2021 Interleaves circuit-switched and packetswitched flits Optimize setup latency Improve throughput over traditional circuitswitching Reduce interconnect delay by up to 22% Improves performance by up to 17% Natalie Enright Jerger - University of Wisconsin 3
Switching Techniques n Packet Switching n n n Efficient bandwidth utilization Router latency overhead Circuit Switching Best of both worlds? n Poor bandwidth utilization n Low latency Efficient bandwidth utilization + low latency n Stalled requests due to unavailable resources n 6/11/2021 Avoids router overhead after circuit is established Natalie Enright Jerger - University of Wisconsin 4
Circuit-Switched Coherence Two key observations n n Commercial 1. 25 Normalized Runtime n Scientific 1. 2 Commercial workloads 1. 15 are very sensitive to 1. 1 Construct fast pair-wise circuits? communication latency 1. 05 Significant pair-wise sharing 1 0. 95 0. 9 1 3 5 7 Per Hop Delay 11 Commercial Workloads: Spec. JBB, Spec. Web, TPC-H, TPC-W Scientific Workloads: Barnes-Hut, Ocean, Radiosity, Raytrace 6/11/2021 Natalie Enright Jerger - University of Wisconsin 5
1. 07 1. 06 1. 05 1. 04 1. 03 1. 02 1. 01 1 0. 99 n Raytrace Radiosity Ocean Barnes TPC-W TPC-H SPECweb 0. 98 SPECjbb Normalized Cycle Counts Traditional Circuit Switching Traditional circuit-switching hurts performance by up to ~7% *Data collected for 16 in-order core chip multiprocessor 6/11/2021 Natalie Enright Jerger - University of Wisconsin 6
Circuit Switching Redesigned n n Latency is critical Utilize Circuit Switching for lower latency n n n Traditional circuit-switching performs poorly My contributions n n 6/11/2021 A circuit connects resources across multiple hops to avoid router overhead Novel setup mechanism Bandwidth stealing Natalie Enright Jerger - University of Wisconsin 7
Outline n n Motivation Router Design n Coherence Protocol Co-design n n 6/11/2021 Setup Mechanism Bandwidth Stealing Pair-wise sharing 3 -hop optimization Region prediction Results Conclusions Natalie Enright Jerger - University of Wisconsin 8
Traditional Circuit Switching Path Setup (with Acknowledgement) 0 Configuration Probe 5 Data Circuit Acknowledgement n n Significant latency overhead prior to data transfer Other requests forced to wait for resources 6/11/2021 Natalie Enright Jerger - University of Wisconsin 9
Novel Circuit Setup Policy 0 Configuration Packet A 5 Data Circuit n n Overlap circuit setup with 1 st data transfer Reconfigure existing circuits if no unused links available n Allows piggy-backed request to always achieve low n latency Multiple circuit planes prevent frequent reconfiguration 6/11/2021 Natalie Enright Jerger - University of Wisconsin 10
Setup Network n Light-weight setup network n Narrow n n n Low Load n n Multiple narrow circuit planes prevent frequent reconfiguration Reconfiguration n 6/11/2021 No virtual channels small area footprint Stores circuit configuration information n n Circuit plane identifier (2 bits) + Destination (4 bits) Buffered, traverses packet-switched pipeline Natalie Enright Jerger - University of Wisconsin 11
Packet-Switched Bandwidth Stealing n Remember: problem with traditional Circuit-Switching is poor bandwidth n n Need to overcome this limitation Hybrid Circuit-Switched Solution: Packetswitched messages snoop incoming links n When there are no circuit-switched messages on the link n 6/11/2021 A waiting packet-switched message can steal idle bandwidth Natalie Enright Jerger - University of Wisconsin 12
Hybrid Circuit-Switched Router Design Allocators Inj T Ej N S E W 6/11/2021 T N S T E T T W Crossbar Natalie Enright Jerger - University of Wisconsin 13
HCS Pipeline n Circuit-switched messages: 1 stage Switch Traversal Router n Link Traversal Link Packet-switched messages: 3 stages n Aggressive Speculation reduces stages Buffer Write Virtual Channel/ Switch Allocation Switch Traversal Router 6/11/2021 Natalie Enright Jerger - University of Wisconsin Link Traversal Link 14
Outline n n Motivation Router Design n Coherence Protocol Co-design n n 6/11/2021 Setup Mechanism Bandwidth Stealing Pair-wise sharing 3 -hop optimization Region prediction Results Conclusions Natalie Enright Jerger - University of Wisconsin 15
Sharing Characterization n Temporal sharing relationship: 67 -76% of misses are serviced by 2 most recently shared with cores Commercial Workloads: Spec. JBB, Spec. Web, TPC-H, TPC-W Scientific Workloads: Barnes-Hut, Ocean, Radiosity, Raytrace 6/11/2021 Natalie Enright Jerger - University of Wisconsin 16
Directory Coherence 3 1 1 Data Response A 2 Read A Directory 6/11/2021 Address State Sharers A Exclusive Shared 1, 2 2 B Shared 1, 2 Natalie Enright Jerger - University of Wisconsin 2 Forward Read A 17
Coherence Protocol Co-Design n n Goal: Better exploit circuits through coherence protocol Modifications: n n Allow a cache to send a request directly to another cache Notify the directory in parallel Prediction mechanism for pair-wise sharers Directory is sole ordering point 6/11/2021 Natalie Enright Jerger - University of Wisconsin 18
Circuit-Switched Coherence Optimization 2 Data Response A 1 1 1 Update A 3 2 Read A Ack A Directory 6/11/2021 Address State Sharers A Shared Exclusive 1, 2 2 B Shared 1, 2 Natalie Enright Jerger - University of Wisconsin 19
Region Prediction Region Table A -2 B 3 1 4 Region A Update 3 1 Data Response A[0] 2 Miss A[0] 5 Read A[1] Directory n State Sharers A[0] Shared 1, 2 2 A[1] Shared 2 2 Forward Read A[0] Each memory region spans 1 KB n 6/11/2021 Address Takes advantage of spatial and temporal sharing Natalie Enright Jerger - University of Wisconsin 20
Simulation Methodology n PHARMSim n n Full-system multi-core simulator Detailed network level model n n n 6/11/2021 Cycle accurate router model Flit-level contention modeled More results in paper Natalie Enright Jerger - University of Wisconsin 21
Simulation Workloads Commercial SPECjbb Java server workload 24 warehouse, 200 requests SPECweb Web server, 300 requests TPC-W Web e-commerce, 40 transactions TPC-H Decision support system Scientific Barnes-Hut 8 k particles, full run Ocean 514 x 514, parallel phase Radiosity Parallel phase Raytrace Car input, parallel phase Synthetic Uniform Random Destination select with uniform random distribution Permutation Traffic Each node communicates with one other node (pair-wise) 6/11/2021 Natalie Enright Jerger - University of Wisconsin 22
Simulation Configuration Processors Cores 16 in-order general purpose Memory System L 1 I/D Caches 32 KB 2 -way set associative 1 cycle Private L 2 caches 512 KB 4 -way set associative 6 cycles 64 Byte lines Shared L 3 Cache 16 MB (1 MB bank/tile) 4 -way set associative 12 cycles Main Memory Latency 100 cycles Interconnect: 4 x 4 2 -D Mesh Packet-switched baseline n Table with config parameters Hybrid Circuit Switching 6/11/2021 Optimized 1 -3 router stages 4 Virtual channels with 4 Buffers each 1 router stage or 4 Circuit planes Natalie Enright Jerger - 2 University of Wisconsin 23
Network Results HCS, 2 Circuits HCS, 4 Circuits 1 Normalized Delay 0. 95 0. 9 0. 85 0. 8 0. 75 n 6/11/2021 TPC-W TPC-H SPECweb SPECjbb Raytrace Radiosity Ocean Barnes 0. 7 Communication latency is key: shave off precious cycles in network latency Natalie Enright Jerger - University of Wisconsin 24
Barnes n 6/11/2021 Ocean Radiosity Raytrace SPECjbb SPECweb TPC-H 65. 6% 34. 4% 71. 3% Partial 28. 7% 28. 1% 63. 8% 36. 2% 56. 0% 44. 0% 66. 8% 33. 2% 82. 3% 17. 7% 71. 2% 9 8 7 6 5 4 3 2 1 0 71. 9% CS 28. 8% Cycles Flit breakdown TPC-W Reduce interconnect latency for a significant fraction of messages Natalie Enright Jerger - University of Wisconsin 25
HCS + Protocol Optimization 1. 2 Performance Improvement 1. 15 1. 1 Protocol Optimization 1. 05 Interconnect 1 0. 95 0. 9 PS HCS PS HCS Barnes n Ocean Radiosity Raytrace SPECjbb SPECweb TPC-H TPC-W Improvement of HCS + Protocol optimization is greater than the sum of HCS or Protocol Optimization alone. n Protocol Optimization drives up circuit reuse, better utilizing HCS 6/11/2021 Natalie Enright Jerger - University of Wisconsin 26
Uniform Random Traffic Interconnect Latency 12 11 10 9 HCS 7 6 5 n PS 8 0% 10% 20% 30% Load (% Link Capacity) 40% 50% HCS successfully overcomes bandwidth limitations associated with Circuit Switching 6/11/2021 Natalie Enright Jerger - University of Wisconsin 27
Related Work n Router optimizations n n Hybrid Circuit-Switching n n n Wave-switching [Duato, ICPP 1996] So. CBus [Wiklund, IPDPS 2003] Coherence Protocols n 6/11/2021 Express Virtual Channels [Kumar, ISCA 2007] Single-cycle router [Mullins, ISCA 2004] Many more… Significant research in removing overhead of indirection Natalie Enright Jerger - University of Wisconsin 28
Circuit-Switched Coherence Summary n Replace packet-switched mesh with hybrid circuit-switched mesh n n Reconfigurable circuits Dedicated bandwidth for frequent pair-wise sharers Low Latency and low power n n Interleave circuit and packet switched flits Avoid switching/routing Devise novel coherence mechanisms to take advantage of benefits of circuit switching 6/11/2021 Natalie Enright Jerger - University of Wisconsin 29
Thank you www. ece. wisc. edu/~pharm enrightn@cae. wisc. edu 6/11/2021 Natalie Enright Jerger - University of Wisconsin 30
Circuit Setup n Novel Setup Policy n Overlap circuit setup with first data transfer n n Reconfigure existing circuits if no unused links available n n n Allows piggy-backed request to always achieve low latency Multiple narrow circuit planes prevent frequent reconfiguration Reconfiguration n 6/11/2021 Store circuit information at each router Buffered, traverses packet-switched pipeline Natalie Enright Jerger - University of Wisconsin 31
- Dana vantrease
- Natalie enright jerger
- Helen enright
- Overhead4d
- Natalie enright jerger
- Natalie enright jerger
- Tosca orchestration
- Biovac fd 5 n peh
- Test de marcha valores normales
- Enright forgiveness model
- Christina enright
- Aisling enright
- Mikko keränen kamk
- Mikko manka tampereen yliopisto
- Mikko routala
- Mikko h. lipasti
- Mikko h. lipasti
- Mikko karppinen
- Hh embryo
- Mikko häikiö
- Mikko lipasti
- Ked ferno
- Mikko lipasti
- Mikko mäkelä metropolia
- Mikko heiskanen
- Mikko h. lipasti
- Mikko tiira
- Mikko lipasti
- Kennel dosmarin kokemuksia
- Joannaseppa
- System by mikko
- Ecc syndrome