CSE 539 Advanced Computer Architecture Chapter 7 Multiprocessors
- Slides: 37
CSE 539: Advanced Computer Architecture Chapter 7 Multiprocessors and Multicomputers Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani Sumit Mittu Assistant Professor, CSE/IT Lovely Professional University sumit. 12735@lpu. co. in
In this chapter… • • Multiprocessor System Interconnects Cache Coherence and Synchronization Mechanisms Three Generations of Multi-computers Message Routing Schemes Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
MULTIPROCESSOR SYSTEM INTERCONNECTS • Network Characteristics o Topology • Dynamic Networks o Timing control protocol • Synchronous (with global clock) • Asynchronous (with handshake or interlocking mechanism) o Switching method • Circuit switching • Packet switching o Control Strategy • Centralized (global controller to receive requests from all devices and grant network access) • Distributed (requests handled by local devices independently) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
MULTIPROCESSOR SYSTEM INTERCONNECTS • Hierarchical Bus System o Local Bus (board level) • Memory bus, data bus o Backplane Bus (backplane level) • VME bus (IEEE 1014 -1987), Multibus II (IEEE 1296 -1987), Futurebus+ (IEEE 896. 1 -1991) o I/O Bus (I/O level) o E. g. Encore Multimax multprocessor’s nanobus • 20 slots • 32 -bit address path • 64 -bit data path • Clock rate: 12. 5 MHz • Total Memory bandwidth: 100 Megabytes per second Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
MULTIPROCESSOR SYSTEM INTERCONNECTS • Hierarchical Buses and Caches o Cache Levels • First level caches • Second level caches o Buses • (Intra) Cluster Bus • Inter-cluster bus o Cache coherence • Snoopy cache protocol for coherence among first level caches of same cluster • Intra-cluster cache coherence controlled among second level caches and results passed to first level caches o Use of Bridges between multiprocessor clusters Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
MULTIPROCESSOR SYSTEM INTERCONNECTS • Hierarchical Buses and Caches Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
MULTIPROCESSOR SYSTEM INTERCONNECTS • Crossbar Switch Design o Based on number of network stages • Single stage (or recirculating) networks • Multistage networks o Blocking networks o Non-blocking (re-arranging) networks • Crossbar networks o n x m and n 2 Cross-point switch design o Crossbar benefits and limitations • Multiport Memory Design o Multiport Memory Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
MULTIPROCESSOR SYSTEM INTERCONNECTS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
CACHE COHERENCE MECHANISMS • Cache Coherence Problem o Inconsistent copies of same memory block in different caches o Sources of inconsistency: • Sharing of writable data • Process migration • I/O activity • Protocol Approaches o Snoopy Bus Protocol o Directory Based Protocol • Write Policies o (Write-back, Write-through) x (Write-invalidate, Write-update) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-through caches • Write invalidate coherence protocol for write-through caches • Write-update coherence protocol for write-through caches • Data item states: o VALID o INVALID • Possible operations: o Read by same processor R(i) Read by different processor R( j ) o Write by same processor W(i) Write by different processor W( j ) o Replace by same processor Z(i) Replace by different processor Z( j ) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 19
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 20
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-through caches – write invalidate scheme Curren Operatio New t n State Valid Curren Operatio New t n State R(i) Valid W(i) Valid Z(i) Invali d R(j) Valid R(j) Invali d W(j) Sumit Mittu, Assistant Professor, CSE/IT, Invalid Invali Lovelyd. Professional University W(j) Invali 21
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-back caches • Ownership protocol: Write invalidate coherence protocol for write-through caches • Data item states: o RO : Read Only (Valid state) o RW : Read Write (Valid state) o INV : Invalid state • Possible operations: o Read by same processor R(i) Read by different processor R( j ) o Write by same processor W(i) Write by different processor W( j ) o Replace by same processor Z(i) Replace by different processor Z( j ) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 22
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-back caches – write invalidate (ownership protocol) scheme Curren Operatio New t n State RO (Valid) Curren Operatio New t n State R(i) RO R(i) RW R(i) RO W(i) RW Z(i) INV R(j) RO R(j) INV W(j) INV Z(j) RO Z(j) RW Z(j) INV 23 RW (Valid) Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University INV (Invalid )
CACHE COHERENCE MECHANISMS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 24
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-once Protocol • First write using write-through policy • Subsequent writes using write-back policy • In both cases, data item copy in remote caches is invalidated • Data item states: o Valid : cache block consistent with main memory copy o Reserved : data has been written exactly once and is consistent with main memory copy o Dirty : data is written more than once but is not consistent with main memory copy o Invalid : block not found in cache or is inconsistent with main memory copy Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 25
CACHE COHERENCE MECHANISMS • Snoopy Bus Protocols o Write-once Protocol • Cache events and actions: o Read-miss o Read-hit o Write-miss o Write-hit o Block replacement Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 26
CACHE COHERENCE MECHANISMS • Multilevel Cache Coherence Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 27
CACHE COHERENCE MECHANISMS • Protocol Performance issues o Snoopy Cache Protocol Performance determinants: • Workload Patterns • Implementation Efficiency o Goals/Motivation behind using snooping mechanism • Reduce bus traffic • Reduce effective memory access time o Data Pollution Point • Miss ratio decreases as block size increases, up to a data pollution point (that is, as blocks become larger, the probability of finding a desired data item in the cache increases). • The miss ratio starts to increasing as the block size increases to data pollution point. o Ping-Pong effect on data shared between multiple caches • If two processes update a data item alternately, data will continually migrate between two caches with high miss-rate Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 28
THREE GENERATIONS OF MULTICOMPUTERS • Multicomputer v/s Multiprocessor • Design Choices for Multi-computers o Processors • Low cost commodity (off-the-shelf) processors o Memory Structure • Distributed memory organization • Local memory with each processor o Interconnection Schemes • Message passing, point-to-point , direct networks with send/receive semantics with/without uniform message communication speed o Control Strategy • Asynchronous MIMD, MPMD and SPMD operations Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 29
THREE GENERATIONS OF MULTICOMPUTERS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 30
THREE GENERATIONS OF MULTICOMPUTERS • The Past, Present and Future Development o First Generation • Example Systems: Caltech’s Cosmic Cube, Intel i. PSC/1, Ametek S/14, n. Cube/10 o Second Generation • Example Systems: i. PSC/2, i 860, Delta, n. Cube/2, Supernode 1000, Ametek Series 2010 o Third Generation • Example Systems: Caltech’s Mosaic C, J-Machine, Intel Paragon o First and second generation multi-computers are regarded as medium-grain systems o Third generation multi-computers were regarded as fine-grain systems. o Fine-grain and shared memory approach can, in theory, combine the relative merits of multiprocessors and multi-computers in a heterogeneous processing Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University environment. 31
THREE GENERATIONS OF MULTICOMPUTERS MIPS 1 10 1 st Generation Typical MFLOPS (scalar) Node Attribute MFLOPS (vector) s Memory Size (in MB) Number of Nodes (N) Typical MIPS System MFLOPS (scalar) Attribute MFLOPS (vector) s Memory Size (in MB) Local Neighbour Commu (in microseconds) Sumit Mittu, Assistant Professor, CSE/IT, ni-cation 2 nd Generation 3 rd Generation 100 0. 1 2 40 10 40 200 0. 5 4 32 64 256 1024 64 2560 100 K 6. 4 512 40 K 640 10 K 200 K 32 1 K 32 K 2000 5 0. 5 Lovely Professional University 32
THREE GENERATIONS OF MULTICOMPUTERS Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 33
MESSAGE PASSING SCHEMES • Message Routing Schemes • Message Formats o Messages o Packets o Flits (Control Flow Digits) • Data Only Flits • Sequence Number • Routing Information • Store-and-forward routing • Wormhole routing Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 34
MESSAGE PASSING SCHEMES Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 35
MESSAGE PASSING SCHEMES • Asynchronous Pipelining Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 36
MESSAGE PASSING SCHEMES • Latency Analysis L: Packet length (in bits) W: Channel Bandwidth (in bits per second) D: Distance (number of nodes traversed minus 1) F: Flit length (in bits) Communication Latency in Store-and-forward Routing • TSF = L (D + 1) / W o Communication Latency in Wormhole Routing • TWH = L / W + F D / W o o o Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 37
- Interconnection structure of multiprocessor
- Multiprocessors are classified as
- Uma multiprocessors using multistage switching networks
- Multiple processor systems
- Computer architecture
- Climatic injury
- Jsp 539
- Tower of god 539
- Soccer txt 539
- Opwekking 595
- 87,539,319 = 2283 + _ _ _ 3
- Descomponer 539
- Jsp 539
- Cse 598 advanced software analysis and design
- Bus architecture in computer organization
- Organization and architecture difference
- What is basic computer organization
- Advanced topics in computer science
- Craig reinhart
- Advanced computer forensics
- Fastbloc
- Explain architecture business cycle
- Call and return architecture in software engineering
- Examples of integral product architecture
- Define product architecture
- Chapter 6 shielded metal arc welding
- Advanced evolution - chapter 4
- Advanced evolution chapter 4
- Advanced evolution chapter 8
- Advanced accounting chapter 1
- Computer organization and architecture 10th solution
- Ocs architecture
- Computer organization lab experiments
- Introduction to computer organization and architecture
- Timing and control in computer architecture
- Computer architecture: concepts and evolution
- Programmed i/o in computer architecture
- Floating point division algorithm in computer architecture