Taxonomy 2 MIMD Multiprocessor shared memory P 1
- Slides: 55
Taxonomy 2
MIMD Multiprocessor (shared memory) P 1 P 2 Pn Interconnection Network IN M 1 M 2 Processors Mn Memory modules (Tightly Coupled Architecture) 3 •
Shared Memory • • • Uniform Memory Access (UMA) • Tightly Coupled system Non-Uniform Memory Access (NUMA) • • • Loosely Coupled system Cedar from University of Illinois BBN Butterfly Cache Only Memory Access (COMA) • • Using global distributed caches Kendal Square Research-1 (KSR-1) 4 4
MIMD (cont. ) Global Memory GM 1 Global Memory GM 2 GMn Global Interconnection Network (Global IN) P 1 P 2 Pn C I N CM 1 P 1 CM 2 P 2 CM 3 Pn (Loosely Coupled Architecture) - Cedar CM 1 C I N CM 2 CM 3 5
MIMD (cont. ) M 2 P 2 Mn Pn Interconnection Network P 1 (IN) M 1 (Loosely Coupled Architecture) – BBN Butterfly 6
MIMD (cont. ) Interconnection Network (IN) D 1 D 2 Dn C 1 C 2 Cn P 1 P 2 Pn (COMA Architecture) 7
MIMD (cont. ) • Multicomputer (Message passing) IN P 1 P 2 Pn M 1 M 2 Mn 8
MIMD (cont. ) • Data flow machine • an instruction is ready for execution when data for its operands have been made available • • Purely self-contained No program counter 9
SIMD • Array Processor • centralized control unit
MISD • Pipelined vector processor
MISD (cont. ) • Systolic array 12
Hybrid Architecture • Combine features of different architectures to provide better performance for parallel computations. • Two type of parallelism • • Control parallelism (MIMD) Data parallelism (SIMD) 13
Special Purpose Devices • Artificial Neural Networks (ANN) • Fuzzy logic 14
Neural Networks (Definition) üA large number of PEs üConnected in Parallel üCapable of learning üAdaptive to changing üAble to cope with serious disruptions Power of Connectivity vs Power of Processors 15
Fuzzy logic (Definition) üApproximate reasoning üFormal principals of reasoning 16
Interconnection Network (IN) • The measure of an IN is “how quickly it can deliver how much of what’s needed to the right place, reliably and at good cost and value”. 17
Performance Criteria for IN • • Latency • Transit time for a single msg. Bandwidth • how much msg. traffic the IN can handle, e. g. , Mbytes/s Connectivity • How many immediate neighbors each node has, and how often each neighbor can be reached Hardware cost • What fraction of the total hardware cost the IN represents E. g. , wires, switches, connectors, arbitration logic, … 18
Performance Criteria for IN (cont. ) • Reliability • Redundancy paths, • Functionality • Additional functions performed by the IN, such as combining of msg. and fault tolerance • e. g. , data routing, interrupt handling, request/ message combining, coherence • Scalability • The ability to be expandable 19
Definitions • Node degree: • node degree is the number of links (edges) connected to the node • Diameter: • the diameter of a network is defined as the largest minimum distance between any pair of nodes. The minimum distance between a pair of nodes is the minimum number of communication links (hops) that data from one of the nodes must traverse in order to reach the other node. • Network Size • The number of nodes in the IN 20
Data Routing • Functions in data routing • Shifting • Rotation • Permutation (one-to-one) • Broadcast (one-to-all) • Multicast (many-to-many) • Personalized communication (one-to-many) • Shuffle / Exchange 21
Types of IN • Static Networks • Dynamic Networks 22
Static Networks • Shared Bus • • Degree = 1 Diameter = 1 23
Static Networks (cont. ) • Linear Array • • Degree = 2 Diameter = n-1 24
Static Networks (cont. ) • Ring • • Degree = 2 Diameter: • • unidirectional: n-1 bidirectional: Ceil(n-1)/2 25
Static Networks (cont. ) • Binary tree • Degree: • • Leaf=1 Root=2 Others=3 Diameter: 2(h-1) 26
Static Networks (cont. ) • Fat tree. • Degree and Diameter is the same as binary tree • Due to heavy traffic towards root, the number of links gradually increases (e. g. , CM-5). 27
Static Networks (cont. ) • Star. • Degree: • • • Central = n-1 Others = 1 Diameter= 2 28
Static Networks (cont. ) Source 000 001 010 111 100 101 110 011 Destination 000 010 100 111 001 011 101 110 Shuffle(sn-1 sn-2. . . s 0) = sn-2 sn-3. . . s 0 sn-1 Exchange(sn-1 sn-2. . . s 1 s 0) = sn-1 sn-2. . . s 1 s 0 29
Shuffle-Exchange Network • For N=8 • Applications: • The shuffle-exchange network provides suitable interconnection patterns for implementing certain parallel algorithms, such as polynomial evaluation, Fast Fourier Transform (FFT), sorting, and matrix transposition. 30
Static Networks (cont. ) n Mesh. n Degree: n n Corner= 2 Sides = 3 Middle= 4 Diameter= 2(n-1) 31
Mesh Routing Algorithm • Simple routing algorithm routes a packet from source S to destination D in a mesh with n 2 nodes. 1. Compute the row distance R as 2. Compute the column distance C as 3. Add the values R and C to the packet header at the source node. 4. Starting from the source, send the packet for R rows and then for C columns. 32
Example (Mesh) n to route a packet from node 6 (i. e. , S=6) to node 12 (i. e. , D =12), n the packet goes through two paths, as shown in the figure: 33
Static Networks (cont. ) • Illiac • • Degree= 4 Diameter= n-1 chordal ring 34
Static Networks (cont. ) n Torus n n Degree= 4 Diameter= 2(Ceil(n/2)) 35
Static Networks (cont. ) • Hyper. Cube • • • Degree= n Diameter= n Address Bits= n Dimensions= n Neighbors= n 36
Example Embedding a 4 -by-4 mesh in a 4 -cube 37
Static Networks (cont. ) • n-Mesh • Degree: • • Corner= n Internal= 2 n n < Others < 2 n Diameter= 38
Static Networks (cont. ) • k-Ary n-cube • Degree: • • • If k=2 then Degree = n If k>2 then Degree = 2 n Diameter= (a) 4 -ary 2 -cube network (b) 3 -ary 3 -cube network 39
Cache Coherence Multiprocessor environment Cache dedicated to each processor Cache coherence problem How to keep multiple copies of the data consistent during execution ? 40
Cache Coherence Mechanisms 1. Hardware-based schemes • Snoopy cache protocols • • 2. 3. If INs have broadcast features Directory cache protocols • No broadcast features in INs Software-based schemes Combination 41
Cache Coherence Mechanisms (cont. ) • Action taken on • Read Miss • Write Hit • Write Miss 42
Snoopy Cache Protocol A two-processor configuration with copies of data block x § write-through § write-back 43
Centralized Directory Protocols • Full-map protocol directory 44
Scalable Cache Coherency 45
Classification of Dynamic Networks 46
Dynamic Networks (Crossbar) 47
Dynamic Networks (Single-Stage) In Single-Stage Network any permutation can be reached by at most 3(log. N 2) -1 pass. 48
Multi Stages - Blocking • Example: Multi Stage Cube , Omega 49
Multi Stages – Nonblocking • Example: Three-stage Clos 50
Dynamic Networks (Clos) 51
Multi Stages - Rearrangeable • Example: 8 -to-8 (Benes) 52
Interconnection Design Decisions • Considerations about selecting the Architecture of Interconnection Network • Operation Mode • Control Strategy • Network Topology • Switching Methodology • Functional characteristics of the switch 53
Interconnection Design Decisions • Operation mode: • • Asynchronous Combined Control Strategy • • • Synchronous Centralized control Distributed control Switching methodology • • • circuit switching packet switching integrated switching 54
- Multiple instruction multiple data
- Shared vs distributed memory
- Multiprocessor access contention
- Single instruction single data
- Sisd flynn's taxonomy
- Advantages and disadvantages of mimd
- Kendall marzano new taxonomy
- Design issues of distributed shared memory
- Share a single centralized memory
- What is shared memory
- Distributed shared memory
- Coffman condition
- Acc shared memory
- Message passing os
- Ece344
- Symmetric shared memory architecture
- Shared memory consistency models: a tutorial
- Shared virtual memory
- Symmetric shared memory architecture
- Murat olcay özcan
- Java shared memory
- Distributed shared memory
- Symmetric shared memory architecture
- Cuda shared memory size
- Posix shared memory synchronization
- Lara srivastava
- Multiprocessor
- Multiprocessor synchronization
- Multiprocessor
- Multiprocessor network topologies
- Tightly coupled multiprocessor
- Interconnection structure of multiprocessor
- Characteristics of multiprocessor system
- Bf scheduler
- Interconnection networks in multiprocessor systems
- Multiprocessor
- Multiprocessor vs multicore
- Real time operating system
- Cm* architecture
- Pxie-pcie8372
- The art of multiprocessor programming exercise solutions
- Interprocessor arbitration
- Arithmetic intensity
- Real-time executive for multiprocessor systems
- Multiprocessor vs multicore
- Multiprocessor synchronization
- Multiprocessor
- Fpga cpu tutorial
- Multiprocessor operating system
- Contoh multiprocessor
- Page fault
- Prototypes in semantics
- Primary memory and secondary memory
- Virtual memory in memory hierarchy consists of
- Excplicit memory
- Logical and physical address in os