Indirect Adaptive Routing on Large Scale Interconnection Networks



























- Slides: 27

Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally John Kim Computer System Laboratory Stanford University Korean Advanced Institute of Science and Technology 1

Overview • Indirect adaptive routing (IAR) – Allow adaptive routing decision to be based on local and remote congestion information • Main contributions – – Three new IAR algorithms for large scale networks Steady state and transient performance evaluations Impact of network configurations Cost of implementation 2

Presentation Outline • Background – The dragonfly network – Adaptive routing • Indirect adaptive routing algorithms • Performance results • Implementation considerations 3

The Dragonfly Network • High Radix Network – – • Each router – – • Global Network Group 0 Group 1 Group 2 … Three types of channels Directly connected to a few other groups Each group – – p 1 Router 0 Router 1 Router 2 … … Large network with a global diameter of one p 0 … Organized by a local network Large number of global channels (GC) … • High radix routers Small network diameter Local Network 4

Routing on the Dragonfly • Minimal Routing (MIN) 1. Source local network 2. Global network 3. Destination local network • Some Adversarial traffic congests the global channels Group 0 Group 1 Group 2 … – Each group i sends all packets to group i+1 Congestion Router 0 Router 1 … p 1 Router 2 … … – Poor performance on benign traffic p 0 … • Oblivious solution: Valiant’s Algorithm (VAL) 5

Adaptive Routing • Choose between the MIN path and a VAL path at the packet source [Singh'05] – Decision metric: path delay – Delay: product of path distance and path queue depth • Measuring path queue length is unrealistic • Use local queues length to approximate path q 0 q 1 MIN GC VAL GC Congestion q 2 q 3 – Require stiff backpressure Source Router 6

Adaptive Routing: Worst Case Traffic 450 Packet Latency (Simulation cycles) 400 350 300 250 200 Valiant’s Minimal Adaptive 150 100 0 0. 1 0. 2 0. 3 Throughput (Flit Injection Rate) 0. 4 0. 5 7

Indirect Adaptive Routing • Improve routing decision through remote congestion information • Previous method: – Credit round trip [Kim et. al ISCA’ 08] • Three new methods: – Reservation – Piggyback – Progressive 8

Credit Round Trip (CRT) • Delay the return of local credits to the congested router • Creates the illusion of stiffer backpressure MIN GC VAL GC Congestion • Drawbacks – Remote congestion is still inferred through local queues – Information not up to date Credits Delayed Credits Source Router [Kim et. al ISCA’ 08] 9

Reservation (RES) • Each global channel track the number of incoming MIN packets • Injected packets creates a reservation flit • Routing decision based on the reservation outcome MIN GC Congestion RES Failed • Drawbacks – Reservation flit flooding – Reservation delay VAL GC RES Flit Source Router 10

Piggyback (PB) • Local congestion broadcast – Piggybacking on each packet – Send on idle channels • Congestion data compression MIN GC VAL GC Congestion • Drawbacks – Consumes extra bandwidth – Congestion information not up to date (broadcast delay) GC Free GC Busy Source Router 11

Progressive (PAR) • MIN routing decisions at the source are not final • VAL decisions are final • Switch to VAL when encountering congestion MIN GC VAL GC Congestion • Draw backs – Need an additional virtual channel to avoid deadlock – Add extra hops Source Router 12

Experimental Setup • Fully connected local and global networks – 33 groups – 1, 056 nodes • 10 cycle local channel latency • 100 cycle global channel latency • 10 -flit packets 13

Steady State Traffic: Uniform Random 300 Packet Latency (Simulation cycles) 280 260 240 Piggyback Credit Round Trip Progressive Reservation Minimal 220 200 180 160 140 120 100 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Throughput (Flit Injection Rate) 0. 8 0. 9 14

Steady State Traffic: Worst Case 450 Packet Latency (Simulation cycles) 400 350 Piggyback Credit Round Trip Progressive Reservation Valiant’s 300 250 200 150 100 0 0. 1 0. 2 0. 3 Throughput (Flit Injection Rate) 0. 4 0. 5 15

Transient Traffic: Uniform Random to Worst Case Average Packet Latency per Cycle - UR to WC Packet Latency 500 400 300 200 % of Packets Routing Nonminimally 100 Progressive Piggyback 0 20 40 60 Cycles After Transition 80 100 % Packets Routing Non-minimally per Cycle - UR to WC 100 50 0 Progressive Piggyback 0 20 40 60 Cycles After Transition 80 100 16

Network Configuration Considerations • Packet size – RES requires long packets to amortize reservation flit cost – Routing decision is done on per packet basis • Channel latency – Affects information delay (CRT, PB) – Affects packet delay (PAR, RES) • Network size – Affects information bandwidth overhead (RES, PB) • Global diameter greater than one – Need to exchange congestion information on the global network 17

Cost Considerations • Credit round trip – Credit delay tracker for every local channel • Reservation – Reservation counter for every global channel – Additional buffering at the injection port to store packets waiting for reservation • Piggyback – Global channel lookup table for every router – Increase in packet size • Progressive – Extra virtual channel for deadlock avoidance 18

Conclusion • Three new indirect adaptive routing algorithms for large scale networks • Performance and design evaluation of the algorithms • Best Algorithm? – Piggyback performed the best under steady state traffic – Progressive responded fastest to transient changes – Network configurations will affect some algorithm performance – Cost of implementation 19

Thank You! • Questions? 20

Adaptive Routing: Uniform Traffic 300 VAL MIN Adaptive Packet Latency - Simulation cycles 280 260 240 220 200 180 160 140 120 100 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Throughput - Flit Injection Rate 0. 7 0. 8 0. 9 21

Transient Traffic: Worst Case to Uniform Random 22

Transient Traffic: Worst Case 1 to Worst Case 10 23

1000 Random Permutation Traffic CRT 25 25 25 20 15 10 5 0 200 300 Packet Latency 25 25 % of 1 K Permutations 30 15 10 5 0 200 300 Packet Latency 20 15 10 5 0 200 300 Packet Latency VAL RES 30 20 % of 1 K Permutations 30 0 % of 1 K Permutations PAR 30 % of 1 K Permutations PB 30 20 15 10 5 0 200 300 Packet Latency 24

Effect of Packet size on RES: Worst Case Traffic 550 500 Latency - Simulation cycles 450 400 350 300 250 200 150 1 Flit 2 Flits 4 Flits 8 Flits 100 50 0 0 0. 1 0. 2 0. 3 Throughput - Flit Injection Rate 0. 4 0. 5 25

Large local network: Uniform Random 400 Packet Latency - Simulation cycles 350 300 250 200 150 PB CRT MIN PAR RES 100 50 0 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Throughput - Flit Injection Rate 0. 7 0. 8 0. 9 26

Large local network: Worst Case 600 Packet Latency - Simulation cycles 500 400 300 200 PB CRT PAR RES VAL 100 0 0 0. 1 0. 2 0. 3 Throughput - Flit Injection Rate 0. 4 0. 5 27