Router Designs for Elastic Buffer OnChip Networks George
Router Designs for Elastic. Buffer On-Chip Networks George Michelogiannakis William J. Dally Stanford University
Introduction § EB flow-control was recently proposed. • Uses the channels as distributed FIFOs. § EB routers are bufferless packet-switched routers. • They have the benefits of circuit-switched routers, without the overhead of setting up and tearing down circuits. § This work explores the EB router design space. • By evaluating three representative designs. SC 09: Routers for EB No. Cs 2
The EB Flow-control Idea Pipelined channel Channel as FIFO Elastic buffer Master-slave FF SC 09: Routers for EB No. Cs 3
How Elastic Buffer Channels Work § Ready/valid handshake between elastic buffers • Ready: At least one free storage slot • Valid: Non-empty (driving valid data) Cycle 6 1 2 3 4 5 SC 09: Routers for EB No. Cs 4
Use EB Flow-Control Through the Router VC input-buffered router Three-slot VC & SW output Input buffer EB cover for allocators removed. LAto routing also replaced by arbitration Per-output arbiters applicable done to EB input EB one cycle in instead. networks. advance. EB router 5
Baseline Router - Issues § Issues constraining the clock cycle time: • Three-slot EB FSM too complicated: output EB implemented as FIFO. • Routing is performed serially with switch arbitration. Serially FIFO 6
Enhanced Two-Stage Router § Look-ahead routing to shorten the critical path. § Use two-slot EBs at output and for pipelining. • Flits are stored in the interm. EB and wait for a grant. • Decision to traverse switch made in the same cycle. 7
Enhanced Two-Stage Router – Sync Module § Synchronization module maintains alignment between flits and grants. § Contains an output port EB. • Stores the chosen output port of the current and any other packets in the router stage 1 and interm. EB. Maintains alignment between flits and grants. 8
Enhanced Two-Stage Router – Sync Module § When the current packet’s tail flit is departing: • Sync. module propagates the next output to the arbiters. • From the appropriate location. § Sync. module propagates an update to all outputs. • An output receiving an update from the input it is granting clocks the arbiter output regs at the next edge. 9
Single-Stage Router § Merges the two router stages to: • Reduce router latency. • Avoid pipelining overhead. SC 09: Routers for EB No. Cs 10
Evaluation Methodology § 45 nm worst-case low-power commercial library. § Synopsys DC and Cadence Encounter. • 64 -bit router datapath. 70% initial area utilization ratio. § Used a cycle-accurate network simulator. § We assume each router at its maximum post-P&R frequency, or all at the same frequency. § 8 x 8 2 D mesh. 2 mm-long wires. 1 cycle latency. • Constant packet size of 512 bits. § Averaged over a set of six traffic patterns. § Swept datapath width from 28 to 171 bits. SC 09: Routers for EB No. Cs 11
Placement and Routing Cycle Time § Enhanced twostage has a 26% reduced cycle time compared to the single-stage, and 42% compared to the baseline twostage. SC 09: Routers for EB No. Cs 12
Placement and Routing Energy per Bit § Baseline twostage requires 9% less energy per bit compared to the singlestage, and 35% compared to the enhanced twostage. 13
Placement and Routing Area § Single-stage occupies 30% less area than the enhanced two -stage and 44% less than the baseline twostage. 14
Latency-Throughput, Max Frequencies. Latency increase: Enhanced: Baseline: +1% +46% 15
Latency-Throughput, Equal Frequencies. Latency increase: Enhanced: Baseline: +34% +32% 16
Which Router is the Optimal Choice? Priority Router Choice Operate at maximum frequencies Area Enhanced two-stage Energy Baseline two-stage (closely followed by single-stage) Latency Single-stage (depends on effect on channels) Operate at the same frequency Area Single-stage Energy Baseline two-stage (closely followed by single-stage) Latency Single-stage SC 09: Routers for EB No. Cs 17
Conclusion § Improved EB router designs can widen the gap compared to VC networks. • Makes EB look even more attractive. § EB routers are simple designs. Simple designs have numerous advantages. • A lot of the complexity of VC networks is ignored by some area and power models. § Overall compared to VC, 43% reduction in power per unit throughput, 67% reduction in cycle time and 22% throughput per unit area. SC 09: Routers for EB No. Cs 18
Questions? SC 09: Routers for EB No. Cs
- Slides: 19