Dynamic Interconnection Networks CEG 4131 Computer Architecture III

  • Slides: 24
Download presentation
Dynamic Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic 1

Dynamic Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic 1

Quiz 1 • NIOS II processor – basics • FPGA – basics • Caches

Quiz 1 • NIOS II processor – basics • FPGA – basics • Caches – – – Performance Size, number of bits Block placement Block identification Block replacement Write strategy 2

Quiz 1 (Cont. ) Key terms: • Flynn’s taxonomy • Shared memory architectures –

Quiz 1 (Cont. ) Key terms: • Flynn’s taxonomy • Shared memory architectures – Cache coherence – NUMA, COMA – Symmetric Multiprocessors • • Distributed memory systems Classification based on communication Classification based on type of parallelism Chapter 1 from the textbook 3

Quiz 1 (Cont. ) • • • Amdahl law Speedup, Efficiency Parallelism profile, average

Quiz 1 (Cont. ) • • • Amdahl law Speedup, Efficiency Parallelism profile, average parallelism, MIPS Scalability Understanding of performance of the program for parallel addition • Chapters 3. 1, 3. 2. 2, 3. 4, 3. 5 4

Overview • • Network properties Switches Single and multistage Interconnection networks Crossbar 5

Overview • • Network properties Switches Single and multistage Interconnection networks Crossbar 5

Network properties • Node degree d - the number of edges incident on a

Network properties • Node degree d - the number of edges incident on a node. – In degree – Out degree • Diameter D of a network is the maximum shortest path between any two nodes. • The network is symmetric if it looks the same from any node. • The network is scalable if it expandable with scalable performance when the machine resources are increased. 6

Bisection width • Bisection width is the minimum number of wires that must be

Bisection width • Bisection width is the minimum number of wires that must be cut to divide the network into two equal halves. Small bisection width -> low bandwidth A large bisection width -> a lot of extra wires • A cut of a network C(N 1, N 2) is a set of channels that partition the set of all nodes into two disjoint sets N 1 and N 2. Each element of C(N 1, N 2) is a channel with a source in N 1 and destination in N 2 or vice versa. • A bisection of a network is a cut that partitions the entire network nearly in half, such that |N 2|≤|N 1|≤|N 2+1|. Here |N 2| means the number of nodes that belong to the partition N 2. • The channel bisection of a network is the minimum channel count over all bisections of the network: 7

Factors Affecting Performance • Functionality – how the network supports data routing, interrupt handling,

Factors Affecting Performance • Functionality – how the network supports data routing, interrupt handling, synchronization, request/message combining, and coherence • Network latency – worst-case time for a unit message to be transferred • Bandwidth – maximum data rate • Hardware complexity – implementation costs for wire, logic, switches, connectors, etc. 8

2 × 2 Switches *From Advanced Computer Architectures, K. Hwang, 1993. 9

2 × 2 Switches *From Advanced Computer Architectures, K. Hwang, 1993. 9

Switches Module size Legitimate states Permutation connection 2 × 2 4 × 4 256

Switches Module size Legitimate states Permutation connection 2 × 2 4 × 4 256 24 8 × 8 16, 777, 216 40, 320 N × N NN N! • Permutation function: each input can only be connected a single output. • Legitimate state: Each input can be connected to multiple outputs, but each output can only be connected to a single input 10

Single-stage networks • • Single stage Shuffle-Exchange IN (left) Perfect shuffle mapping function (right)

Single-stage networks • • Single stage Shuffle-Exchange IN (left) Perfect shuffle mapping function (right) Perfect shuffle operation: cyclic shift 1 place left, eg 101 --> 011 Exchange operation: invert least significant bit, e. g. 101 --> 100 *From Ben Macey at http: //www. ee. uwa. edu. au/~maceyb/aca 319 -2003 11

Multistage Interconnection Networks • • The capability of single stage networks are limited but

Multistage Interconnection Networks • • The capability of single stage networks are limited but if we cascade enough of them together, they form a completely connected MIN (Multistage Interconnection Network). Switches can perform their own routing or can be controlled by a central router This type of networks can be classified into the following four categories: Nonblocking – A network is called strictly nonblocking if it can connect any idle input to any idle output regardless of what other connections are currently in process • Rearrangeable nonblocking – In this case a network should be able to establish all possible connections between inputs and outputs by rearranging its existing connections. • Blocking interconnection – A network is said to be blocking if it can perform many, but not all, possible connections between terminals. – Example: the Omega network 12

Omega networks • A multi-stage IN using 2 × 2 switch boxes and a

Omega networks • A multi-stage IN using 2 × 2 switch boxes and a perfect shuffle interconnect pattern between the stages • In the Omega MIN there is one unique path from each input to each output. • No redundant paths → no fault tolerance and the possibility of blocking Example: • Connect input 101 to output 001 • Use the bits of the destination address, 001, for dynamically selecting a path • Routing: - 0 means use upper output - 1 means use lower output *From Ben Macey at http: //www. ee. uwa. edu. au/~maceyb/aca 319 -2003 13

Omega networks • • log 2 N stages of 2 × 2 switches N/2

Omega networks • • log 2 N stages of 2 × 2 switches N/2 switches per stage S=(N/2) log 2(N) switches Number of permutations in a omega network 2 S 14

Baseline networks • The network can be generated recursively • The first stage N

Baseline networks • The network can be generated recursively • The first stage N × N, the second (N/2) × (N/2) • Networks are topologically equivalent if one network can be easily reproduced from the other networks by simply rearranging nodes at each stage. *From Advanced Computer Architectures, K. Hwang, 1993. 15

Crossbar Network • Each junction is a switching component – connecting the row to

Crossbar Network • Each junction is a switching component – connecting the row to the column. • Can only have one connection in each column *From Advanced Computer Architectures, K. Hwang, 1993. 16

Crossbar Network • The major advantage of the cross-bar switch is its potential for

Crossbar Network • The major advantage of the cross-bar switch is its potential for speed. • In one clock, a connection can be made between source and destination. • The diameter of the cross-bar is one. • Blocking if the destination is in use • Because of its complexity, the cost of the cross-bar switch can become the dominant factor for a large multiprocessor system. • Crossbars can be used to implement the a×b switches used in MIN’s. In this case each crossbar is small so costs are kept down. 17

Problem A) Use two-input AND and OR gates to construct Nx. N crossbar switch

Problem A) Use two-input AND and OR gates to construct Nx. N crossbar switch network between N processors and N memory modules. Use cij signal as the enable signal for the switch in ith row and jth column. Let the width of each crosspoint be w bits. B) Estimate the total number of AND and OR gates needed as a function of N and w. 18

Problem (cont. ) 19

Problem (cont. ) 19

Problem (cont. ) 20

Problem (cont. ) 20

Problem (cont. ) 21

Problem (cont. ) 21

Performance Comparison Networ Latenc Switch k y ing comple xity Bus Consta O(1) nt

Performance Comparison Networ Latenc Switch k y ing comple xity Bus Consta O(1) nt O(N) MIN Wiring Blocki comple ng xity O(w) yes O(log 2 O(Nlog O(Nw yes N) log 2 N) 2 N) Crossb O(1) ar O(N 2) O(N 2 w no ) 22

Some Commercial Solutions [3] • System-on-chip crossbar networks: – Nexus from Fulcrum Microsystems •

Some Commercial Solutions [3] • System-on-chip crossbar networks: – Nexus from Fulcrum Microsystems • The core is used in PMC-Sierra dual MIPS processor RM 9000 23

References 1. Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El.

References 1. Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El. Barr, John Wiley and Sons, 2005. 2. Advanced Computer Architecture Parallelism, Scalability, Programmability, by K. Hwang, Mc. Graw-Hill 1993. 3. A. Lines, “Nexus: an asynchronous crossbar interconnect for synchronous system-on-chip designs”, Proc. of High Performance Interconnects, pp 2 -7, 2003. 24