Chapter 6 Multiprocessor System Introduction Each processor in

  • Slides: 24
Download presentation
Chapter 6 Multiprocessor System

Chapter 6 Multiprocessor System

Introduction Each processor in a multiprocessor system can be executing a different instruction at

Introduction Each processor in a multiprocessor system can be executing a different instruction at any time. The major advantages of MIMD system – Reliability – High performance The overhead involved with MIMD – Communication between processors – Synchronization of the work – Waste of processor time if any processor runs out of work to do – Processor scheduling

Introduction (continued) task – An entity to which a processor is assigned – a

Introduction (continued) task – An entity to which a processor is assigned – a program, a function or a procedure in execution process – another word for a task processor (or processing element) – hardware resource on which tasks are executed

Introduction (continued) Thread – The sequence of tasks performed in succession by a given

Introduction (continued) Thread – The sequence of tasks performed in succession by a given processor – The path of execution of a processor through a number of tasks. – Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application. – Refer to Example 6. 1 (degree of parallelism =3)

R-to-C ratio A measure of how much overhead is produced per unit of computation.

R-to-C ratio A measure of how much overhead is produced per unit of computation. – R: the length of the run time of the task (=computation time) – C: the communication overhead This ratio signifies task granularity A high R-to-C ratio implies that communication overhead is insignificant compared to computation time.

Task granularity – Coarse grain parallelism High R-to-C ratio – Fine grain parallelism Low

Task granularity – Coarse grain parallelism High R-to-C ratio – Fine grain parallelism Low R-to-C ratio – The general tendency to maximum performance is to resort to the finest possible granularity. providing for the highest degree of parallelism. – Maximum parallelism does not lead to maximum overhead. a trade-off is required to reach an optimum level.

6. 1 MIMD Organization (Figure 6. 2) Two popular MIMD organizations – Shared memory

6. 1 MIMD Organization (Figure 6. 2) Two popular MIMD organizations – Shared memory (or tightly coupled ) architecture – Message passing (or loosely coupled) architecture Share memory architecture – UMA (uniform memory architecture) – Rapid memory access – Memory contention

6. 1 MIMD Organization (continued) Message-passing architecture – Distributed memory MIMD system – NUMA

6. 1 MIMD Organization (continued) Message-passing architecture – Distributed memory MIMD system – NUMA (nonuniform memory access) – Heavy communication overhead for remote memory access – No memory contention problem Other models – Mixed of two

6. 2 Memory Organization Two parameters of interest in MIMD memory system design –

6. 2 Memory Organization Two parameters of interest in MIMD memory system design – bandwidth – latency. Memory latency is reduced by increasing the memory bandwidth. – By building the memory system with multiple independent memory modules (Banked and interleaved memory architecture) – By reducing the memory access and cycle times

Multi-port memories Figure 6. 3 (b) – Each memory module is a three-port memory

Multi-port memories Figure 6. 3 (b) – Each memory module is a three-port memory device. – All three ports can be active simultaneously. – The only restriction is that only one location can be write data into a memory location.

Cache incoherence The problem wherein the value of a data item is not consistent

Cache incoherence The problem wherein the value of a data item is not consistent throughout the memory system. – Write-through A processor updates the cache and also the corresponding entry in the main memory. – Updating protocol – Invalidating protocol – Write-back An updated cache-block is written back to the main memory just before that block is replaced in the cache.

6. 2 Memory Organization (continued) Cache coherence schemes – Not to use private caches

6. 2 Memory Organization (continued) Cache coherence schemes – Not to use private caches (Figure 6. 4) – With private cache architecture, but to cache only non-sharable data items. – Cache flushing Shared data are allowed to be cached only when it is known that only one processor will be accessing the data

6. 2 Memory Organization (continued) Cache coherence schemes (continued) – Bus watching (or bus

6. 2 Memory Organization (continued) Cache coherence schemes (continued) – Bus watching (or bus snooping) (Figure 6. 5) Bus watching schemes incorporate hardware that monitors the shared bus for data LOAD and STORE into each processor’s cache controller. – Write-once The first STORE causes a write-through to the main memory. Ownership protocol

6. 3 Interconnection Network Bus (Figure 6. 6) – Bus window (Figure 6. 7(a))

6. 3 Interconnection Network Bus (Figure 6. 6) – Bus window (Figure 6. 7(a)) – Fat tree (Figure 6. 7 (b)) Loop or ring – token ring standard Mesh

6. 3 Interconnection Network(continued) Hypercube – Routing is straightforward. – The number of nodes

6. 3 Interconnection Network(continued) Hypercube – Routing is straightforward. – The number of nodes must be increased by powers of two. Crossbar – It offers multiple simultaneous communications but at a high hardware complexity. Multistage switching networks

6. 4 Operating System Considerations The major functions of the multiprocessor system – Keeping

6. 4 Operating System Considerations The major functions of the multiprocessor system – Keeping track of the status of all the resources at all time – Assigning tasks to processors in a justifiable manner – Spawning and creating new processors such that they can be executed in parallel or independently of each other. – Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.

6. 4 Operating System Considerations (continued) Synchronization mechanisms – Processes in an MIMD operate

6. 4 Operating System Considerations (continued) Synchronization mechanisms – Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations. – Processes compete with each other to gain access to shared data items. – An access control mechanism is needed to maintain orderly access

6. 4 Operating System Considerations (continued) Synchronization mechanisms – The most primitive synchronization techniques

6. 4 Operating System Considerations (continued) Synchronization mechanisms – The most primitive synchronization techniques Test & set Semaphores Barrier synchronization Fetch & add Heavy-weight process and Light-weight process Scheduling – Static – Dynamic : load balancing

6. 5 Programming (continued) Four main structures of parallel programming – Parbegin / parend

6. 5 Programming (continued) Four main structures of parallel programming – Parbegin / parend – Fork / join – Doall – Processes, tasks, procedures, and so on can be declared for parallel execution.

6. 6 Performance Evaluation and Scalability Performance evaluation – Speed-up : S = Ts

6. 6 Performance Evaluation and Scalability Performance evaluation – Speed-up : S = Ts / Tp To= Tp. P-Ts Tp=(To+Ts)/P S = Ts P/(To+Ts) – Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)

Scalability Scalability: the ability to increase speedup as the number of processors increase. A

Scalability Scalability: the ability to increase speedup as the number of processors increase. A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases. – Time-constrained scaling – Memory-constrained scaling

Isoefficiency function E = 1/(1+To/Ts) To/Ts=(1 -E)/E. Hence, Ts=ETo/(1 -E) For a given value

Isoefficiency function E = 1/(1+To/Ts) To/Ts=(1 -E)/E. Hence, Ts=ETo/(1 -E) For a given value of E, E/(1 -E) is a constant, K. Then Ts=KTo (Isoefficency function) A small isoeffiency function indicates that small increments in problem size are sufficient to maintain efficiency when p is increased.

6. 6 Performance Evaluation and Scalability (continued) Performance models – The basic model Each

6. 6 Performance Evaluation and Scalability (continued) Performance models – The basic model Each task is equal and takes R time units to be executed on a processor. If two tasks on different processors wish to communicate with each other, they do so at a cost C time units. – Model with linear communication overhead – Model with overlapped communication – Stochastic model

Examples Alliant FX series – Figure 6. 17 – Parallelism Instruction Loop level Task

Examples Alliant FX series – Figure 6. 17 – Parallelism Instruction Loop level Task level