A LowOverhead FullyDistributed Guaranteed Delivery Routing Algorithm for
A Low-Overhead, Fully-Distributed, Guaranteed. Delivery Routing Algorithm for Faulty Network-on. Chips Mohammad Fattah 1, Antti Airola 1, Rachata Ausavarungnirun 2, Nima Mirzaei 3, Pasi Liljeberg 1, Juha Plosila 1, Siamak Mohammadi 3, Tapio Pahikkala 1, Onur Mutlu 2 and Hannu Tenhunen 1
What is This Talk About? � Overtime, routers and links can become faulty. � Dynamically find alternative paths. � Previous works have at least one of the following limitations: � � � Any # of faults Cover only few number of faults Use a central controller High area overhead High reconfiguration overhead upon new faults No central component No routing table Maze-Routing overcomes all the above limitations: � � � 2 Full-coverage: formally proven Fully-distributed: using autonomous and standalone routers Low area overhead: using an algorithmic approach (16 X less area compared to routing tables) Low reconfiguration overhead: by on the fly path exploration (Instantaneous operation on new failures) Better performance: 50% higher saturation throughput and, 28% lower latency on SPEC benchmarks compared to state-of-the-art Detect partitioning No reconfiguration phase Source: condenaststore. com
Aggressive Transistor Scaling Key Benefit � Integrating A Major Curse many IPs � Processors � Cache slices � Memory controllers � Specialized HW � Etc. 3 � Reduced reliability � Fabrication time: � Defect Our designs must be: � Process variation Fault-tolerant by construction! � Run-time: � Negative bias temperature instability (NBTI) � Hot carrier injection (HCI) � Gate oxide breakdown � Electro-Migration
IP vs. Network Faults � IP � Degrades the performance � Rest of the system can continue � Network Elements � Cripples It is crucial to tolerate the performance. Many faults in links and routers! � Single point of failure 4
Maze-Routing Fault-Tolerant by Construction Four Critical Goals � It � Full is not: �A router architecture, with fault tolerance patched to it coverage (guaranteed delivery) Maze-Routing is � Rather, � Fully-distributed The first to provide all! it is � Essentially a routing algorithm, which � Is inherently fault-tolerant East South West Local 5 XY Maze XY North Maze Priority Arbiter North X East South West Local � Low � No operation area footprint reconfiguration component/phase
Our 4 Goals Maze-Routing - Full coverage - Full distribution - Finding the path - Low area cost - Fast adaptation - Detecting disconnected nodes 6 Results - Area - Throughput - Reconfiguration overhead
Our 4 Goals Maze-Routing - Full coverage - Full distribution > Finding the path - Low area cost - Fast adaptation > Detecting disconnected nodes 7 Results - Area - Throughput - Reconfiguration overhead
Goal 1: Full (Fault) Coverage Literature � Limited Maze-Routing number of faults � No restriction on Fault count � Fault pattern � � Detect � � Limited fault pattern � Limited when disconnected nodes 8 disconnected nodes At router level
Goal 2: Fully Distributed Operation Maze-Routing � Centralized � Single methods point of failure � TMR: Expensive � Distributed methods � No central component � No reconfiguration unit North � Each router makes East individual decisions South � Faults in algorithm West only disables the Local associated links Reconf. � Synchronization points. � Fault in Reconf. unit. 9 Cent. SW/HW Controller North Maze Priority Arbiter Literature Maze X Maze Reconf. Reconf. Maze East South West Local
Goal 3: Low Area Overhead Literature � Routing � Maze-Routing tables read ports � Implementation � Power cost dissipation � Vulnerability to run-time faults One failed bit: affects the whole router � Area ~ fault probability of router 10 algorithmic approach � No routing table High area overhead � 5 � � An
Goal 4: Low Reconfiguration Overhead Literature Maze-Routing � New � No 1) 2) 3) failure detected? Pause the network Reconfigure to an alternative solution Resume normal operation � Issues? Severe degradation of performance � aggressive online testing � � Few 11 works with fast reconfiguration phase � Path to destination is dynamically calculated per packet � Called on the fly reconfiguration
Maze-Routing: The First to Provide All Coverage Zhang et al. [43] few LBDR [35] moderate d 2 -LBDR [7] moderate OSR-Lite [38] moderate TOSR [5] moderate BLINC [25] moderate u. LBDR [36] high Wachter et al. [39] high Fick et al. [19] high Face routing [11] high FTDR-H [18] high u. DIREC [32] full ARIADNE [3] full Maze-routing full 12 Reconfiguration fully distributed central distributed fully distributed central distributed fully distributed O(Area) low low high high excessive high low O(Reconf. ) on the fly N/A moderate fast N/A slow on the fly fast excessive slow on the fly
Our 4 Goals Maze-Routing - Full coverage - Full distribution - Finding the path - Low area cost - Fast adaptation - Detecting disconnected nodes 13 Results - Area - Throughput - Reconfiguration overhead
Preliminaries � Face: regions bounded by links and routers � 4 inner faces � 1 outer face � Right/Left hand rule: exit from first output in right/left side. �� : clockwise around inner faces �� : counterclockwise around inner faces � Opposite direction around outer faces 14
Preliminaries (II) � Few additional fields in the header 1. MDbest : closest distance (MD) to dst that the packet has reached so far � Initial: MDsrc, dst � Only decrements 2. Mode: routing mode used for the packet � Values: normal, traversal (� or � ), unreachable � Initial: normal � 2 15 more fields to detect disconnected nodes
Maze-Routing 4 there any productive output? � No? 3│� it and dec(MDbest) we should enter traversal mode: 3 � Draw line(cur, dst) between Maze-Routing current node definitely and dst reaches dst, a path to dst exists. 3│N �� ? Take the first output in the leftif of line(cur, dst) �� ? Take the first output in the right of line(cur, dst) We provide the formal proof in the paper. � Set the mode (either � or � ), accordingly 4 � Traversal � If mode: to (and act as in) normal mode � Otherwise, 16 follow the hand rule 1 1 dst 3 2 1 4 3 2 4│N MDcur, dst = MDbest with productive output? � Return 2 0│N � Take 3 1│N � Is mode: 2│N � Normal 5│N 5 src
Detecting Disconnected Nodes � Traversal � If mode: MDcur, dst = MDbest with productive output? � Return to normal mode 2 1 the hand rule 1│Ndetails are More implementation � The destination is unreachable if: 2 3 1 2 1│ � No? � Follow 1 dst available in the paper � In 2 1 2 3 � The 3 2 3 4 traversal mode, we meet the same node as the one we entered the traversal mode hand rule picks the same output as when we entered the traversal mode 17 3│N src
Our 4 Goals Maze-Routing - Full coverage - Full distribution - Finding the path - Low area cost - Fast adaptation - Detecting disconnected nodes 18 Results - Area - Throughput - Reconfiguration overhead
Simulation Methodology � NOCulator[1] � 8 x 8 mesh for performance analysis � Synthetic traffic for performance evaluation � SPEC CPU 2006 benchmarks are also evaluated � Maze-Routing[2] implanted in min. BD[3] routers � Deflection-based: deadlock freedom � Golden and sliver flits: router-level livelock freedom � Retransmit-once: protocol-level deadlock freedom [1] NOCulator: https: //github. com/CMU-SAFARI/NOCulator [2] Maze-Routing: https: //github. com/CMU-SAFARI/NOCulator/tree/Maze-routing [3] Min. BD: Fallin, Chris, et al. "Min. BD: Minimally-buffered deflection routing for energy-efficient interconnect. " No. CS 2012. 19
Configurations � Maze-Routing � 16 buffer spaces per (min. BD) router � Base-line router � Wormhole buffered routers � 1 VC per port � 40 buffer spaces per router � Faults: � Links disabled randomly � From 1 to 5 link failures 20
Workloads � Synthetic � Uniform � SPEC traffic random traffic with variant injection rates CPU 2006 benchmarks � Grouped based on L 1 misses per kilo instruction (MPKI) � 3 groups: High (>50), Low (<5), and Medium (rest) intensity � 4 mixes: L (all Low), ML (Medium/Low), M (all Medium), and H (all High). 21
Area Overhead 22 15. 9 x 30 25 20 15 27% 10 5 0 Maze-routing ARIADNE 2. 1 x 6. 06 Logic-based method � Central approach � Limited coverage � 35 5. 68 � LBDRe: 40 239. 21 Smallest table � Reconfiguration logic is not implemented � 5 read ports � 45 44. 71 � ARIADNE: 16 x 16 3. 8 x 5 copies of alg. , 1 per port 50 15. 05 � 8 x 8 11. 84 � Maze-routing: area (µm 2) 60 nm technology node Hundrends � STMicro LBDRe
Throughput: Uniform Random Traffic 1 disabled link Maze-routing 5 disabled links up*/down* Maze-routing 21 Average flit latency (cycles) 21 20 19 20 Sub-optimal 18 paths 17 17 16 0 23 50% 19 18 16 Provided path divergence up*/down* 0. 04 0. 08 0. 12 0. 16 0. 24 0. 28 Injection rate (flits/node/cycle) 0 0. 04 0. 08 0. 12 0. 16 0. 24 0. 28 Injection rate (flits/node/cycle)
Throughput: SPEC CPU Average packet latency workload mix Up*/Down* Maze-routing 5 failures no failure L 16. 7 16. 4 17. 8 16. 4 ML 18. 8 18. 2 18. 9 17. 2 M 27. 7 25. 7 21. 6 19. 2 H 54. 4 50. 5 25. 8 23. 1 AVG 29. 4 27. 7 21 19 30% latency reduction in average case 24
Reconfiguration Overhead ARIADNE 66 K Cycles 40 K Cycles 0. 2 flits/node/cycle Average Latency (cycles) 19 Maze-routing 18 17 Maze-Routing has no reconfiguration phase 16 15 25 20 25 30 35 Time (× 104 cycle) 40 45 50
Summary �A practical fault-tolerant routing algorithm must � Provide full coverage with guaranteed delivery � Operate in fully-distributed manner � Impose low area overhead � Have low reconfiguration overhead � Maze-Routing � NOCulator is the first work to meet all the above goals and Maze-Routing are available on Git. Hub � https: //github. com/CMU-SAFARI/NOCulator/tree/Maze-routing 26
A Low-Overhead, Fully-Distributed, Guaranteed. Delivery Routing Algorithm for Faulty Network-on. Chips Mohammad Fattah 1, Antti Airola 1, Rachata Ausavarungnirun 2, Nima Mirzaei 3, Pasi Liljeberg 1, Juha Plosila 1, Siamak Mohammadi 3, Tapio Pahikkala 1, Onur Mutlu 2 and Hannu Tenhunen 1
Backup slides
Area Overhead � Header fields can be coded in 14/17 bits in 8 x 8/16 x 16 meshes. � Assuming a baseline router with 144 -bit channel width, we need to widen the channel by 10%/12%. � Results 29 in almost 20%/25% increase in the router area.
Deflection Implications � When a packet is deflected � Header � We values are not valid anymore need to reset the header values: � Mode Normal � MDbest MD (next router, dst) 4 2 1 3│� 3 1 dst 4 3 2 1 5 4 3 2 src 30 3
Delivery Proof � Property: Given there is a path between src and dst, starting from src, by traversing the face underlying line(src, dst), the packet will definitely intersect the line at some point (p) other than src 4 3 3 2 1 � The MD(p, dst) is definitely smaller than MD(src, dst). � In traversal mode: If MDcur, dst = MDbest with productive output? � Return to (and act as in) normal mode we definitely exit to normal mode 31 1 dst 4 3 2 1 5 4 3 2 src
A Low-Overhead, Fully-Distributed, Guaranteed. Delivery Routing Algorithm for Faulty Network-on. Chips Mohammad Fattah 1, Antti Airola 1, Rachata Ausavarungnirun 2, Nima Mirzaei 3, Pasi Liljeberg 1, Juha Plosila 1, Siamak Mohammadi 3, Tapio Pahikkala 1, Onur Mutlu 2 and Hannu Tenhunen 1
- Slides: 32