Indirect Adaptive Routing on Large Scale Interconnection Networks

  • Slides: 27
Download presentation
Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally John

Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally John Kim Computer System Laboratory Stanford University Korean Advanced Institute of Science and Technology 1

Overview • Indirect adaptive routing (IAR) – Allow adaptive routing decision to be based

Overview • Indirect adaptive routing (IAR) – Allow adaptive routing decision to be based on local and remote congestion information • Main contributions – – Three new IAR algorithms for large scale networks Steady state and transient performance evaluations Impact of network configurations Cost of implementation 2

Presentation Outline • Background – The dragonfly network – Adaptive routing • Indirect adaptive

Presentation Outline • Background – The dragonfly network – Adaptive routing • Indirect adaptive routing algorithms • Performance results • Implementation considerations 3

The Dragonfly Network • High Radix Network – – • Each router – –

The Dragonfly Network • High Radix Network – – • Each router – – • Global Network Group 0 Group 1 Group 2 … Three types of channels Directly connected to a few other groups Each group – – p 1 Router 0 Router 1 Router 2 … … Large network with a global diameter of one p 0 … Organized by a local network Large number of global channels (GC) … • High radix routers Small network diameter Local Network 4

Routing on the Dragonfly • Minimal Routing (MIN) 1. Source local network 2. Global

Routing on the Dragonfly • Minimal Routing (MIN) 1. Source local network 2. Global network 3. Destination local network • Some Adversarial traffic congests the global channels Group 0 Group 1 Group 2 … – Each group i sends all packets to group i+1 Congestion Router 0 Router 1 … p 1 Router 2 … … – Poor performance on benign traffic p 0 … • Oblivious solution: Valiant’s Algorithm (VAL) 5

Adaptive Routing • Choose between the MIN path and a VAL path at the

Adaptive Routing • Choose between the MIN path and a VAL path at the packet source [Singh'05] – Decision metric: path delay – Delay: product of path distance and path queue depth • Measuring path queue length is unrealistic • Use local queues length to approximate path q 0 q 1 MIN GC VAL GC Congestion q 2 q 3 – Require stiff backpressure Source Router 6

Adaptive Routing: Worst Case Traffic 450 Packet Latency (Simulation cycles) 400 350 300 250

Adaptive Routing: Worst Case Traffic 450 Packet Latency (Simulation cycles) 400 350 300 250 200 Valiant’s Minimal Adaptive 150 100 0 0. 1 0. 2 0. 3 Throughput (Flit Injection Rate) 0. 4 0. 5 7

Indirect Adaptive Routing • Improve routing decision through remote congestion information • Previous method:

Indirect Adaptive Routing • Improve routing decision through remote congestion information • Previous method: – Credit round trip [Kim et. al ISCA’ 08] • Three new methods: – Reservation – Piggyback – Progressive 8

Credit Round Trip (CRT) • Delay the return of local credits to the congested

Credit Round Trip (CRT) • Delay the return of local credits to the congested router • Creates the illusion of stiffer backpressure MIN GC VAL GC Congestion • Drawbacks – Remote congestion is still inferred through local queues – Information not up to date Credits Delayed Credits Source Router [Kim et. al ISCA’ 08] 9

Reservation (RES) • Each global channel track the number of incoming MIN packets •

Reservation (RES) • Each global channel track the number of incoming MIN packets • Injected packets creates a reservation flit • Routing decision based on the reservation outcome MIN GC Congestion RES Failed • Drawbacks – Reservation flit flooding – Reservation delay VAL GC RES Flit Source Router 10

Piggyback (PB) • Local congestion broadcast – Piggybacking on each packet – Send on

Piggyback (PB) • Local congestion broadcast – Piggybacking on each packet – Send on idle channels • Congestion data compression MIN GC VAL GC Congestion • Drawbacks – Consumes extra bandwidth – Congestion information not up to date (broadcast delay) GC Free GC Busy Source Router 11

Progressive (PAR) • MIN routing decisions at the source are not final • VAL

Progressive (PAR) • MIN routing decisions at the source are not final • VAL decisions are final • Switch to VAL when encountering congestion MIN GC VAL GC Congestion • Draw backs – Need an additional virtual channel to avoid deadlock – Add extra hops Source Router 12

Experimental Setup • Fully connected local and global networks – 33 groups – 1,

Experimental Setup • Fully connected local and global networks – 33 groups – 1, 056 nodes • 10 cycle local channel latency • 100 cycle global channel latency • 10 -flit packets 13

Steady State Traffic: Uniform Random 300 Packet Latency (Simulation cycles) 280 260 240 Piggyback

Steady State Traffic: Uniform Random 300 Packet Latency (Simulation cycles) 280 260 240 Piggyback Credit Round Trip Progressive Reservation Minimal 220 200 180 160 140 120 100 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Throughput (Flit Injection Rate) 0. 8 0. 9 14

Steady State Traffic: Worst Case 450 Packet Latency (Simulation cycles) 400 350 Piggyback Credit

Steady State Traffic: Worst Case 450 Packet Latency (Simulation cycles) 400 350 Piggyback Credit Round Trip Progressive Reservation Valiant’s 300 250 200 150 100 0 0. 1 0. 2 0. 3 Throughput (Flit Injection Rate) 0. 4 0. 5 15

Transient Traffic: Uniform Random to Worst Case Average Packet Latency per Cycle - UR

Transient Traffic: Uniform Random to Worst Case Average Packet Latency per Cycle - UR to WC Packet Latency 500 400 300 200 % of Packets Routing Nonminimally 100 Progressive Piggyback 0 20 40 60 Cycles After Transition 80 100 % Packets Routing Non-minimally per Cycle - UR to WC 100 50 0 Progressive Piggyback 0 20 40 60 Cycles After Transition 80 100 16

Network Configuration Considerations • Packet size – RES requires long packets to amortize reservation

Network Configuration Considerations • Packet size – RES requires long packets to amortize reservation flit cost – Routing decision is done on per packet basis • Channel latency – Affects information delay (CRT, PB) – Affects packet delay (PAR, RES) • Network size – Affects information bandwidth overhead (RES, PB) • Global diameter greater than one – Need to exchange congestion information on the global network 17

Cost Considerations • Credit round trip – Credit delay tracker for every local channel

Cost Considerations • Credit round trip – Credit delay tracker for every local channel • Reservation – Reservation counter for every global channel – Additional buffering at the injection port to store packets waiting for reservation • Piggyback – Global channel lookup table for every router – Increase in packet size • Progressive – Extra virtual channel for deadlock avoidance 18

Conclusion • Three new indirect adaptive routing algorithms for large scale networks • Performance

Conclusion • Three new indirect adaptive routing algorithms for large scale networks • Performance and design evaluation of the algorithms • Best Algorithm? – Piggyback performed the best under steady state traffic – Progressive responded fastest to transient changes – Network configurations will affect some algorithm performance – Cost of implementation 19

Thank You! • Questions? 20

Thank You! • Questions? 20

Adaptive Routing: Uniform Traffic 300 VAL MIN Adaptive Packet Latency - Simulation cycles 280

Adaptive Routing: Uniform Traffic 300 VAL MIN Adaptive Packet Latency - Simulation cycles 280 260 240 220 200 180 160 140 120 100 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Throughput - Flit Injection Rate 0. 7 0. 8 0. 9 21

Transient Traffic: Worst Case to Uniform Random 22

Transient Traffic: Worst Case to Uniform Random 22

Transient Traffic: Worst Case 1 to Worst Case 10 23

Transient Traffic: Worst Case 1 to Worst Case 10 23

1000 Random Permutation Traffic CRT 25 25 25 20 15 10 5 0 200

1000 Random Permutation Traffic CRT 25 25 25 20 15 10 5 0 200 300 Packet Latency 25 25 % of 1 K Permutations 30 15 10 5 0 200 300 Packet Latency 20 15 10 5 0 200 300 Packet Latency VAL RES 30 20 % of 1 K Permutations 30 0 % of 1 K Permutations PAR 30 % of 1 K Permutations PB 30 20 15 10 5 0 200 300 Packet Latency 24

Effect of Packet size on RES: Worst Case Traffic 550 500 Latency - Simulation

Effect of Packet size on RES: Worst Case Traffic 550 500 Latency - Simulation cycles 450 400 350 300 250 200 150 1 Flit 2 Flits 4 Flits 8 Flits 100 50 0 0 0. 1 0. 2 0. 3 Throughput - Flit Injection Rate 0. 4 0. 5 25

Large local network: Uniform Random 400 Packet Latency - Simulation cycles 350 300 250

Large local network: Uniform Random 400 Packet Latency - Simulation cycles 350 300 250 200 150 PB CRT MIN PAR RES 100 50 0 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 Throughput - Flit Injection Rate 0. 7 0. 8 0. 9 26

Large local network: Worst Case 600 Packet Latency - Simulation cycles 500 400 300

Large local network: Worst Case 600 Packet Latency - Simulation cycles 500 400 300 200 PB CRT PAR RES VAL 100 0 0 0. 1 0. 2 0. 3 Throughput - Flit Injection Rate 0. 4 0. 5 27