Chapter 13 Multiprocessor http sslab cnu ac kr

Chapter 13 Multiprocessor 충남대학교 컴퓨터전공 http: //sslab. cnu. ac. kr 이철훈 1

Chapter 13 Characteristics of Multiprocessor n 병렬 컴퓨터 구조 1. Multiprocessor 구조 - Shared-memory or tightly-coupled 구조 : shared memory를 통해 프로세서간 통신을 함 - 하나의 operating system copy가 존재 2. Multicomputer 구조 - message-passing or loosely-coupled 구조 : message를 주고 받음으로써 통신함 - 각 컴퓨터마다 operating system copy가 하나씩 존재 n 병렬 컴퓨터의 목적 1. high-performance : 병렬 처리를 통해 performance를 높인다 - 여러 개의 independent job 들을 병렬로 수행 - 하나의 job 이 여러 개의 parallel task로 partition되어 병렬로 수행 2. high reliability : 하나의 프로세서에 fault가 발생해도 다른 프로세서에 의해 시스템은 계속 가동 Computer System Architecture 1 2

Chapter 13 Interconnection Structures n Time-shared common bus - 모든 processors, memory modules, 그리고 IOP 들이 common bus에 연결 - 한 순간에 하나의 processor 혹은 IOP가 bus를 access할 수 있으므로 bus arbitration이 필요하다 - 가장 간단한 구조이므로 구현이 용이하나, bus bandwidth가 제한되어 있으므로 프로세서가 어느 정도 증가하면 bus가 bottleneck이 되어 performance가 saturation 된다 Fig. 13 -1 Time-shared common bus organization Computer System Architecture 1 3

Chapter 13 Interconnection Structures n Multiport memory - 각 memory module은 multiple port를 가지며 이 port는 각 프로세서 bus에 연결 - 프로세서와 메모리 사이에 multiple path가 있으므로 transfer rate가 증가되어 성 능이 향상된다 - 메모리 access conflict를 해결하기 위해 각 port에는 priority가 있다 - 메모리 control logic이 복잡하며, cable 및 connector 수가 증가하므로 cost 증가 Fig. 13 -3 Multiport memory organization Computer System Architecture 1 4

Chapter 13 Interconnection Structures n Crossbar switch - 프로세서 bus와 메모리 module path 사이의 각 intersection에 crosspoint를 두어 path control을 한다 - Priority를 사용하여 메모리 access conflict를 해결한다 - 가장 bandwidth가 높으나 crossbar switch 구현에 cost가 많이 든다 Fig. 13 -4 Crossbar modules Computer System Architecture 1 5

Chapter 13 Interconnection Structures n Multistage interconnection network - 각 stage마다 여러 개의 a × b switch를 사용한다 (2 × 2 binary switch) - 각 switch는 input과 output 사이의 path를 dynamic하게 setting한다 - 각 multistage interconnection network 구조에 따라 stage 사이에 고정된 interstage connection (ISC) 구조를 가진다 - Omega network l 2 × 2 switch 를 사용하고 perfect shuffle ISC pattern을 가진다 l n × n Omega network : logn, stages, n/2 switches in each stage l Data routing은 destination의 binary code에 의해 control된다 Computer System Architecture 1 6

Chapter 13 Interconnection Structures Fig. 13 -8 8× 8 Omega network Computer System Architecture 1 7

Chapter 13 Interconnection Structures n Hypercube interconnection - n-dimensional hypercube : N = 2 n (전체 프로세서 수) - 각 node에는 n-bit binary node ID가 할당되며, node ID중 오직 하나의 bit만 틀리는 node 사이에 communication link가 존재한다 Fig. 13 -9 Hypercube structure for n = 1, 2, 3. Computer System Architecture 1 8

Chapter 13 Interconnection Arbitration : 여러 프로세서가 동시에 bus request를 했을 때 그들 중 하나의 bus master를 선정하는 작업 Bus arbitration은 pairness 혹은 priority에 기초한다 Serial arbitration 모든 bus master 들이 공통의 bus grant 및 bus request line을 공유한다 - Bus arbiter 들이 daisy-chain으로 연결되어 있다 - 위치에 따라 왼쪽에 있는 arbiter가 priority가 높다 - Bus grant 신호가 왼쪽에서부터 propagate되는데, 각 arbiter는 PI (priority input) 단자로 grant 신호가 들어 왔을 때 - 1. 만약 자신이 bus request를 했을 경우, grant 신호를 오른쪽으로 propagate하지 않고 busy 신호를 activate시켜 자신이 bus control을 가진다 2. 자신이 bus request를 하지 않았을 경우, grant 신호를 PO (priority output) 단자를 통해 오른쪽으로 propagate 한다 Fig. 13 -10 Serial (daisy-chain) arbitration Computer System Architecture 1 9

Chapter 13 Interconnection Arbitration n Parallel arbitration 각 bus master 들이 자신의 bus grant 및 bus request line를 가진다 - Priority 혹은 pairness에 기초하여 arbitration 한다 - 보다 flexible하고 arbitration이 빠르나 line 수가 많으므로 비싸다 - Fig. 13 -11 Parallel arbitration Computer System Architecture 1 10

Chapter 13 Interprocessor Communication and Synchronization n Shared memory나 shared variable의 바른 access를 위해 mutual exclusion mechanism이 필요하다 - Critical section : a program sequence that, once begun, must complete execution before another processor accesses the same shared resource n Semaphore : software-controlled flag - Critical section에 들어가기 전에 semaphore 값을 check하여 1인 경우 다른 프로세서가 이미 critical section에 들어 있다는 뜻이므로 0이 될 때까지 기다린다. 만약 0이면, 이 값을 1로 set하고 critical section에 들어간다. Critical section에서 나올 때 다시 이 값을 0으로 set하여 다른 프로세서가 들어갈 수 있도록 한다. - Semaphore 자체가 shared variable이므로 testing과 setting이 atomic하게 수행되도록 hardware test-and-set instruction을 사용한다 - TSL SEM R ← M [ SEM ] Test semaphore M [ SEM ] ← 1 Set semaphore Computer System Architecture 1 11

Chapter 13 Cache Coherence n Cache coherence problem : 프로세서들의 각 local cache 정보가 consistent 하지 않는 문제 Fig. 13 -13 Cache configuration after a store to X by processor P Computer System Architecture 1 1 12

Chapter 13 Cache Coherence n Snoopy bus protocol을 사용한 해결책 - Snoopy protocol : bus watching 방법을 사용하여 data consistency를 추구한다 1. write-invalidate : local cache에 update가 되면 모든 remote cache에 있는 copy들을 invalidate 시킨다 2. write-update : local cache에 update가 되면 모든 remote cache에 있는 copy들을 update 시킨다 Computer System Architecture 1 13

Chapter 13 Cache Coherence Fig. 7 -14 of Kai Hwang Computer System Architecture 1 14