Taxonomy 2 MIMD Multiprocessor shared memory P 1























































- Slides: 55


Taxonomy 2

MIMD Multiprocessor (shared memory) P 1 P 2 Pn Interconnection Network IN M 1 M 2 Processors Mn Memory modules (Tightly Coupled Architecture) 3 •

Shared Memory • • • Uniform Memory Access (UMA) • Tightly Coupled system Non-Uniform Memory Access (NUMA) • • • Loosely Coupled system Cedar from University of Illinois BBN Butterfly Cache Only Memory Access (COMA) • • Using global distributed caches Kendal Square Research-1 (KSR-1) 4 4

MIMD (cont. ) Global Memory GM 1 Global Memory GM 2 GMn Global Interconnection Network (Global IN) P 1 P 2 Pn C I N CM 1 P 1 CM 2 P 2 CM 3 Pn (Loosely Coupled Architecture) - Cedar CM 1 C I N CM 2 CM 3 5

MIMD (cont. ) M 2 P 2 Mn Pn Interconnection Network P 1 (IN) M 1 (Loosely Coupled Architecture) – BBN Butterfly 6

MIMD (cont. ) Interconnection Network (IN) D 1 D 2 Dn C 1 C 2 Cn P 1 P 2 Pn (COMA Architecture) 7

MIMD (cont. ) • Multicomputer (Message passing) IN P 1 P 2 Pn M 1 M 2 Mn 8

MIMD (cont. ) • Data flow machine • an instruction is ready for execution when data for its operands have been made available • • Purely self-contained No program counter 9

SIMD • Array Processor • centralized control unit

MISD • Pipelined vector processor

MISD (cont. ) • Systolic array 12

Hybrid Architecture • Combine features of different architectures to provide better performance for parallel computations. • Two type of parallelism • • Control parallelism (MIMD) Data parallelism (SIMD) 13

Special Purpose Devices • Artificial Neural Networks (ANN) • Fuzzy logic 14

Neural Networks (Definition) üA large number of PEs üConnected in Parallel üCapable of learning üAdaptive to changing üAble to cope with serious disruptions Power of Connectivity vs Power of Processors 15

Fuzzy logic (Definition) üApproximate reasoning üFormal principals of reasoning 16

Interconnection Network (IN) • The measure of an IN is “how quickly it can deliver how much of what’s needed to the right place, reliably and at good cost and value”. 17

Performance Criteria for IN • • Latency • Transit time for a single msg. Bandwidth • how much msg. traffic the IN can handle, e. g. , Mbytes/s Connectivity • How many immediate neighbors each node has, and how often each neighbor can be reached Hardware cost • What fraction of the total hardware cost the IN represents E. g. , wires, switches, connectors, arbitration logic, … 18

Performance Criteria for IN (cont. ) • Reliability • Redundancy paths, • Functionality • Additional functions performed by the IN, such as combining of msg. and fault tolerance • e. g. , data routing, interrupt handling, request/ message combining, coherence • Scalability • The ability to be expandable 19

Definitions • Node degree: • node degree is the number of links (edges) connected to the node • Diameter: • the diameter of a network is defined as the largest minimum distance between any pair of nodes. The minimum distance between a pair of nodes is the minimum number of communication links (hops) that data from one of the nodes must traverse in order to reach the other node. • Network Size • The number of nodes in the IN 20

Data Routing • Functions in data routing • Shifting • Rotation • Permutation (one-to-one) • Broadcast (one-to-all) • Multicast (many-to-many) • Personalized communication (one-to-many) • Shuffle / Exchange 21

Types of IN • Static Networks • Dynamic Networks 22

Static Networks • Shared Bus • • Degree = 1 Diameter = 1 23

Static Networks (cont. ) • Linear Array • • Degree = 2 Diameter = n-1 24

Static Networks (cont. ) • Ring • • Degree = 2 Diameter: • • unidirectional: n-1 bidirectional: Ceil(n-1)/2 25

Static Networks (cont. ) • Binary tree • Degree: • • Leaf=1 Root=2 Others=3 Diameter: 2(h-1) 26

Static Networks (cont. ) • Fat tree. • Degree and Diameter is the same as binary tree • Due to heavy traffic towards root, the number of links gradually increases (e. g. , CM-5). 27

Static Networks (cont. ) • Star. • Degree: • • • Central = n-1 Others = 1 Diameter= 2 28

Static Networks (cont. ) Source 000 001 010 111 100 101 110 011 Destination 000 010 100 111 001 011 101 110 Shuffle(sn-1 sn-2. . . s 0) = sn-2 sn-3. . . s 0 sn-1 Exchange(sn-1 sn-2. . . s 1 s 0) = sn-1 sn-2. . . s 1 s 0 29

Shuffle-Exchange Network • For N=8 • Applications: • The shuffle-exchange network provides suitable interconnection patterns for implementing certain parallel algorithms, such as polynomial evaluation, Fast Fourier Transform (FFT), sorting, and matrix transposition. 30

Static Networks (cont. ) n Mesh. n Degree: n n Corner= 2 Sides = 3 Middle= 4 Diameter= 2(n-1) 31

Mesh Routing Algorithm • Simple routing algorithm routes a packet from source S to destination D in a mesh with n 2 nodes. 1. Compute the row distance R as 2. Compute the column distance C as 3. Add the values R and C to the packet header at the source node. 4. Starting from the source, send the packet for R rows and then for C columns. 32

Example (Mesh) n to route a packet from node 6 (i. e. , S=6) to node 12 (i. e. , D =12), n the packet goes through two paths, as shown in the figure: 33

Static Networks (cont. ) • Illiac • • Degree= 4 Diameter= n-1 chordal ring 34

Static Networks (cont. ) n Torus n n Degree= 4 Diameter= 2(Ceil(n/2)) 35

Static Networks (cont. ) • Hyper. Cube • • • Degree= n Diameter= n Address Bits= n Dimensions= n Neighbors= n 36

Example Embedding a 4 -by-4 mesh in a 4 -cube 37

Static Networks (cont. ) • n-Mesh • Degree: • • Corner= n Internal= 2 n n < Others < 2 n Diameter= 38

Static Networks (cont. ) • k-Ary n-cube • Degree: • • • If k=2 then Degree = n If k>2 then Degree = 2 n Diameter= (a) 4 -ary 2 -cube network (b) 3 -ary 3 -cube network 39

Cache Coherence Multiprocessor environment Cache dedicated to each processor Cache coherence problem How to keep multiple copies of the data consistent during execution ? 40

Cache Coherence Mechanisms 1. Hardware-based schemes • Snoopy cache protocols • • 2. 3. If INs have broadcast features Directory cache protocols • No broadcast features in INs Software-based schemes Combination 41

Cache Coherence Mechanisms (cont. ) • Action taken on • Read Miss • Write Hit • Write Miss 42

Snoopy Cache Protocol A two-processor configuration with copies of data block x § write-through § write-back 43

Centralized Directory Protocols • Full-map protocol directory 44

Scalable Cache Coherency 45

Classification of Dynamic Networks 46

Dynamic Networks (Crossbar) 47

Dynamic Networks (Single-Stage) In Single-Stage Network any permutation can be reached by at most 3(log. N 2) -1 pass. 48

Multi Stages - Blocking • Example: Multi Stage Cube , Omega 49

Multi Stages – Nonblocking • Example: Three-stage Clos 50

Dynamic Networks (Clos) 51

Multi Stages - Rearrangeable • Example: 8 -to-8 (Benes) 52

Interconnection Design Decisions • Considerations about selecting the Architecture of Interconnection Network • Operation Mode • Control Strategy • Network Topology • Switching Methodology • Functional characteristics of the switch 53

Interconnection Design Decisions • Operation mode: • • Asynchronous Combined Control Strategy • • • Synchronous Centralized control Distributed control Switching methodology • • • circuit switching packet switching integrated switching 54

Multiple instruction multiple data
Shared vs distributed memory
Multiprocessor access contention
Single instruction single data
Sisd flynn's taxonomy
Advantages and disadvantages of mimd
Kendall marzano new taxonomy
Design issues of distributed shared memory
Share a single centralized memory
What is shared memory
Distributed shared memory
Coffman condition
Acc shared memory
Message passing os
Ece344
Symmetric shared memory architecture
Shared memory consistency models: a tutorial
Shared virtual memory
Symmetric shared memory architecture
Murat olcay özcan
Java shared memory
Distributed shared memory
Symmetric shared memory architecture
Cuda shared memory size
Posix shared memory synchronization
Lara srivastava
Multiprocessor
Multiprocessor synchronization
Multiprocessor
Multiprocessor network topologies
Tightly coupled multiprocessor
Interconnection structure of multiprocessor
Characteristics of multiprocessor system
Bf scheduler
Interconnection networks in multiprocessor systems
Multiprocessor
Multiprocessor vs multicore
Real time operating system
Cm* architecture
Pxie-pcie8372
The art of multiprocessor programming exercise solutions
Interprocessor arbitration
Arithmetic intensity
Real-time executive for multiprocessor systems
Multiprocessor vs multicore
Multiprocessor synchronization
Multiprocessor
Fpga cpu tutorial
Multiprocessor operating system
Contoh multiprocessor
Page fault
Prototypes in semantics
Primary memory and secondary memory
Virtual memory in memory hierarchy consists of
Excplicit memory
Logical and physical address in os