ECE 1749 H Interconnection Networks for Parallel Computer

  • Slides: 47
Download presentation
ECE 1749 H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright

ECE 1749 H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger

Announcements • Project Progress Reports – Due March 9, submit by e-mail – Worth

Announcements • Project Progress Reports – Due March 9, submit by e-mail – Worth 15% of project grade – 1 page • Discuss current status of project • Any difficulties/problems encountered • Anticipated changes from original project proposal Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 2

Announcements (2) • 2 presentations next week – Elastic-Buffer Flow Control for On-Chip Networks

Announcements (2) • 2 presentations next week – Elastic-Buffer Flow Control for On-Chip Networks • Presenter: Islam – Express Virtual Channels: Toward the ideal interconnection fabric • Presenter: Yu • 1 Critique due Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 3

Switching/Flow Control Overview • Topology: determines connectivity of network • Routing: determines paths through

Switching/Flow Control Overview • Topology: determines connectivity of network • Routing: determines paths through network • Flow Control: determine allocation of resources to messages as they traverse network – Buffers and links – Significant impact on throughput and latency of network Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 4

Flow Control State Channel Bandwidth • Control state records – allocation of channels and

Flow Control State Channel Bandwidth • Control state records – allocation of channels and buffers to packets – current state of packet traversing node • Channel bandwidth advances flits from this node to next • Buffers hold flits waiting for channel bandwidth Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 5

Packets • Messages: composed of one or more packets – If message size is

Packets • Messages: composed of one or more packets – If message size is <= maximum packet size only one packet created • Packets: composed of one or more flits • Flit: flow control digit • Phit: physical digit – Subdivides flit into chunks = to link width Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 6

Packets (2) Message Header Payload Packet Route Seq# Head Flit Type Body Flit Tail

Packets (2) Message Header Payload Packet Route Seq# Head Flit Type Body Flit Tail Flit VCID Head, Body, Tail, Head & Tail Phit • Off-chip: channel width limited by pins – Requires phits • On-chip: abundant wiring means phit size == flit size Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 7

Packets(3) Cache line RC Type VCID Addr Bytes 0 -15 Head Flit Coherence Command

Packets(3) Cache line RC Type VCID Addr Bytes 0 -15 Head Flit Coherence Command RC Type VCID Bytes 16 -31 Bytes 32 -47 Body Flits Addr Bytes 48 -63 Tail Flit Cmd Head & Tail Flit • Packet contains destination/route information – Flits may not all flits of a packet must take same route Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 8

Switching • Different flow control techniques based on granularity – Message-based: allocation made at

Switching • Different flow control techniques based on granularity – Message-based: allocation made at message granularity (circuit-switching) – Packet-based: allocation made to whole packets – Flit-based: allocation made on a flit-by-flit basis Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 9

Message-Based Flow Control • Coarsest granularity • Circuit-switching – Pre-allocates resources across multiple hops

Message-Based Flow Control • Coarsest granularity • Circuit-switching – Pre-allocates resources across multiple hops • Source to destination • Resources = links • Buffers are not necessary – Probe sent into network to reserve resources Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 10

Circuit Switching • Once probe sets up circuit – Message does not need to

Circuit Switching • Once probe sets up circuit – Message does not need to perform any routing or allocation at each network hop – Good for transferring large amounts of data • Can amortize circuit setup cost by sending data with very low perhop overheads • No other message can use those resources until transfer is complete – Throughput can suffer due setup and hold time for circuits – Links are idle until setup is complete Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 11

Circuit Switching Example 0 Configuration Probe 5 Data Circuit Acknowledgement • Significant latency overhead

Circuit Switching Example 0 Configuration Probe 5 Data Circuit Acknowledgement • Significant latency overhead prior to data transfer – Data transfer does not pay per-hop overhead for routing and allocation Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 12

Circuit Switching Example (2) 0 1 2 Configuration Probe 5 Data Circuit Acknowledgement •

Circuit Switching Example (2) 0 1 2 Configuration Probe 5 Data Circuit Acknowledgement • When there is contention – Significant wait time – Message from 1 2 must wait Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 13

Time-Space Diagram: Circuit-Switching 0 S 08 A 08 1 S 08 Location A 08

Time-Space Diagram: Circuit-Switching 0 S 08 A 08 1 S 08 Location A 08 S 08 2 D 08 S 28 S 08 5 D 08 A 08 0 1 2 3 A 08 4 5 D 08 D 08 S 08 A 08 8 D 08 7 8 9 D 08 6 T 08 D 08 T 08 D 08 S 28 T 08 10 11 12 13 14 15 16 17 18 19 20 Time to setup+ack circuit from 0 to 8 0 1 2 3 4 5 6 7 8 Winter 2011 Time setup from 2 to 8 is blocked ECE 1749 H: Interconnection Networks (Enright Jerger) 14

Packet-based Flow Control • Break messages into packets • Interleave packets on links –

Packet-based Flow Control • Break messages into packets • Interleave packets on links – Better utilization • Requires per-node buffering to store in-flight packets • Two types of packet-based techniques Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 15

Store and Forward • Links and buffers are allocated to entire packet • Head

Store and Forward • Links and buffers are allocated to entire packet • Head flit waits at router until entire packet is received before being forwarded to the next hop • Not suitable for on-chip – Requires buffering at each router to hold entire packet • Packet cannot traverse link until buffering allocated to entire packet – Incurs high latencies (pays serialization latency at each hop) Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 16

Store and Forward Example 0 5 Total delay = 4 cycles per hop x

Store and Forward Example 0 5 Total delay = 4 cycles per hop x 3 hops = 12 cycles • High per-hop latency – Serialization delay paid at each hop • Larger buffering required Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 17

Time-Space Diagram: Store and Forward 0 H B Location 1 B B T H

Time-Space Diagram: Store and Forward 0 H B Location 1 B B T H 2 B B B T H B B B 5 T H B B B T H 8 0 1 2 3 4 5 B B B T 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 18

Packet-based: Virtual Cut Through • Links and Buffers allocated to entire packets • Flits

Packet-based: Virtual Cut Through • Links and Buffers allocated to entire packets • Flits can proceed to next hop before tail flit has been received by current router – But only if next router has enough buffer space for entire packet • Reduces the latency significantly compared to SAF • But still requires large buffers – Unsuitable for on-chip Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 19

Virtual Cut Through Example Allocate 4 flit-sized buffers before head buffers proceeds 0 5

Virtual Cut Through Example Allocate 4 flit-sized buffers before head buffers proceeds 0 5 Total delay = 1 cycle per hop x 3 hops + serialization = 6 cycles • Lower per-hop latency • Large buffering required Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 20

Time-Space Diagram: VCT 0 H 1 B B B T H B B B

Time-Space Diagram: VCT 0 H 1 B B B T H B B B T 4 5 6 7 8 Location 2 5 8 0 1 2 3 Time Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 21

Virtual Cut Through Cannot proceed because only 2 flit buffers available • Throughput suffers

Virtual Cut Through Cannot proceed because only 2 flit buffers available • Throughput suffers from inefficient buffer allocation Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 22

Time-Space Diagram: VCT (2) 0 H 1 B B B T H B B

Time-Space Diagram: VCT (2) 0 H 1 B B B T H B B B T Insufficient Buffers H Location 2 5 8 0 1 2 3 4 5 6 B B B T H B B B 7 8 9 10 11 T Time Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 23

Flit-Level Flow Control • Help routers meet tight area/power constraints • Flit can proceed

Flit-Level Flow Control • Help routers meet tight area/power constraints • Flit can proceed to next router when there is buffer space available for that flit – Improves over SAF and VCT by allocating buffers on a flit-by-flit basis Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 24

Wormhole Flow Control • Pros – More efficient buffer utilization (good for on-chip) –

Wormhole Flow Control • Pros – More efficient buffer utilization (good for on-chip) – Low latency • Cons – Poor link utilization: if head flit becomes blocked, all links spanning length of packet are idle • Cannot be re-allocated to different packet • Suffers from head of line (HOL) blocking Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 25

Wormhole Example Red holds this channel: channel remains idle until read proceeds Channel idle

Wormhole Example Red holds this channel: channel remains idle until read proceeds Channel idle but red packet blocked behind blue Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 26

Time-Space Diagram: Wormhole 0 H Location 1 B H 2 B B Contention H

Time-Space Diagram: Wormhole 0 H Location 1 B H 2 B B Contention H 5 B T B B B T H B B B 7 8 9 10 11 8 0 1 2 3 4 5 6 T Time Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 27

Virtual Channels • First proposed for deadlock avoidance – We’ll come back to this

Virtual Channels • First proposed for deadlock avoidance – We’ll come back to this • Can be applied to any flow control – First proposed with wormhole Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 28

Virtual Channel Flow Control • Virtual channels used to combat HOL blocking in wormhole

Virtual Channel Flow Control • Virtual channels used to combat HOL blocking in wormhole • Virtual channels: multiple flit queues per input port – Share same physical link (channel) • Link utilization improved – Flits on different VC can pass blocked packet Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 29

Virtual Channel Flow Control (2) A (in) A (out) B (in) B (out) Winter

Virtual Channel Flow Control (2) A (in) A (out) B (in) B (out) Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 30

Virtual Channel Flow Control (3) In 1 AH A 1 A 2 A 3

Virtual Channel Flow Control (3) In 1 AH A 1 A 2 A 3 A 4 A 5 1 3 3 1 2 2 3 In 2 BH B 1 B 2 B 3 B 4 1 2 3 3 3 A 1 B 1 A 2 B 2 A 3 Out 2 AH BH A Downstream B Downstream Winter 2011 AH 3 AT B 5 A 1 BH 2 1 1 3 3 2 2 1 1 B 3 A 4 B 4 A 5 B 5 AT BT A 2 B 1 2 3 A 3 B 2 A 4 B 3 ECE 1749 H: Interconnection Networks (Enright Jerger) A 5 B 4 BT AT B 5 BT 31

Virtual Channel Flow Control (3) • Packets compete for VC on flit by flit

Virtual Channel Flow Control (3) • Packets compete for VC on flit by flit basis • In example: on downstream links, flits of each packet are available every other cycle • Upstream links throttle because of limited buffers • Does not mean links are idle – May be used by packet allocated to other VCs Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 32

Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets • 6

Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port • 3 flit buffers/VC Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 33

Summary of techniques Links Buffers Comments Circuit. Switching Messages N/A (buffer-less) Setup & Ack

Summary of techniques Links Buffers Comments Circuit. Switching Messages N/A (buffer-less) Setup & Ack Store and Forward Packet Head flit waits for tail Virtual Cut Through Wormhole Packet Flit Head can proceed HOL Virtual Channel Flit Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) Interleave flits of different packets 34

Deadlock • Using flow control to guarantee deadlock freedom give more flexible routing –

Deadlock • Using flow control to guarantee deadlock freedom give more flexible routing – Recall: routing restrictions needed for deadlock freedom • If routing algorithm is not deadlock free – VCs can break resource cycle • Each VC is time-multiplexed onto physical link – Holding VC implies holding associated buffer queue – Not tying up physical link resource • Enforce order on VCs Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 35

Deadlock: Enforce Order Da A 0 C te A B lin A 1 e

Deadlock: Enforce Order Da A 0 C te A B lin A 1 e B 1 D C D 0 B B 0 D D 1 C 0 A • All message sent through VC 0 until cross dateline • After dateline, assigned to VC 1 – Cannot be allocated to VC 0 again Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 36

Deadlock: Escape VCs • Enforcing order lowers VC utilization – Previous example: VC 1

Deadlock: Escape VCs • Enforcing order lowers VC utilization – Previous example: VC 1 underutilized • Escape Virtual Channels – Have 1 VC that is deadlock free – Example: VC 0 uses DOR, other VCs use arbitrary routing function – Access to VCs arbitrated fairly: packet always has chance of landing on escape VC • Assign different message classes to different VCs to prevent protocol level deadlock – Prevent req-ack message cycles Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 37

Buffer Backpressure • Need mechanism to prevent buffer overflow – Avoid dropping packets –

Buffer Backpressure • Need mechanism to prevent buffer overflow – Avoid dropping packets – Upstream nodes need to know buffer availability at downstream routers • Significant impact on throughput achieved by flow control • Two common mechanisms – Credits – On-off Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 38

Credit-Based Flow Control • Upstream router stores credit counts for each downstream VC •

Credit-Based Flow Control • Upstream router stores credit counts for each downstream VC • Upstream router – When flit forwarded • Decrement credit count – Count == 0, buffer full, stop sending • Downstream router – When flit forwarded and buffer freed • Send credit to upstream router • Upstream increments credit count Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 39

Credit Timeline Node 1 t 2 t 3 Node 2 Flit departs router t

Credit Timeline Node 1 t 2 t 3 Node 2 Flit departs router t Credi Process it d Credit round trip delay Flit t 4 Process t 5 • Round-trip credit delay: – Time between when buffer empties and when next flit can be processed from that buffer entry – If only single entry buffer, would result in significant throughput degradation – Important to size buffers to tolerate credit turn-around Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 40

On-Off Flow Control • Credit: requires upstream signaling for every flit • On-off: decreases

On-Off Flow Control • Credit: requires upstream signaling for every flit • On-off: decreases upstream signaling • Off signal – Sent when number of free buffers falls below threshold Foff • On signal – Sent when number of free buffers rises above threshold Fon Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 41

On-Off Timeline t 1 Foff set to prevent flits arriving before t 4 from

On-Off Timeline t 1 Foff set to prevent flits arriving before t 4 from overflowing Node 1 t 2 t 3 t 4 Node 2 Off Process Flit Flit t 5 Fon set so that Node 2 does not run out of flits between t 5 and t 8 t 6 t 7 On Process Flit t 8 Flit Flit Flit Foff threshold reached Fon threshold reached • Less signaling but more buffering – On-chip buffers more expensive than wires 42

Buffer Utilization Cycle Credit count Head Flit Body Flit 1 Credit (head) 1 2

Buffer Utilization Cycle Credit count Head Flit Body Flit 1 Credit (head) 1 2 2 1 3 0 4 0 5 0 6 0 VA/ SA ST LT BW ST SA ST LT VA/ SA BW 7 0 SA ST C C-LT 8 1 C C-LT Tail Flit Winter 2011 10 1 11 0 SA ST LT SA ST 12 0 CUp Body Flit 2 Credit (body 1) 9 2 ECE 1749 H: Interconnection Networks (Enright Jerger) CUp LT 43

Buffer Sizing • Prevent backpressure from limiting throughput – Buffers must hold flits >=

Buffer Sizing • Prevent backpressure from limiting throughput – Buffers must hold flits >= turnaround time • Assume: – 1 cycle propagation delay for data and credits – 1 cycle credit processing delay – 3 cycle router pipeline • At least 6 flit buffers Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 44

Actual Buffer Usage & Turnaround Delay Actual buffer usage 1 1 Credit propagation delay

Actual Buffer Usage & Turnaround Delay Actual buffer usage 1 1 Credit propagation delay Credit pipeline delay Flit leaves node 1 and credit is sent to node 0 Flit arrives at node 1 and uses buffer Winter 2011 Node 0 receives credit 3 1 flit pipeline delay flit propagation delay Node 0 processes credit, freed buffer reallocated to new flit New flit arrives at Node 1 and reuses buffer New flit leaves Node 0 for Node 1 ECE 1749 H: Interconnection Networks (Enright Jerger) 45

Flow Control and MPSo. Cs • Wormhole flow control • Real time performance requirements

Flow Control and MPSo. Cs • Wormhole flow control • Real time performance requirements – Quality of Service – Guaranteed bandwidth allocated to each node • Time division multiplexing • Irregularity – Different buffer sizes Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 47

Flow Control Summary • On-chip networks require techniques with lower buffering requirements – Wormhole

Flow Control Summary • On-chip networks require techniques with lower buffering requirements – Wormhole or Virtual Channel flow control • Avoid dropping packets in on-chip environment – Requires buffer backpressure mechanism • Complexity of flow control impacts router microarchitecture (in 2 weeks) Winter 2011 ECE 1749 H: Interconnection Networks (Enright Jerger) 48