ECE 1747 Parallel Programming Distributed Shared Memory DSM
ECE 1747: Parallel Programming Distributed Shared Memory (DSM)
Multiprocessor (SMP) proc 1 proc 2 proc 3 X=0 X=0
Consistency Models • Sequential Consistency – All processors observe the same order – Must correspond to some serial order – Only ordering constraint is that reads/writes of P 1 appear in the same order, but no restrictions on relative ordering between processors.
Common consistency protocols • Write update – Multicast update to all replicas • Write invalidate – Invalidate cached copies in p 2, p 3 – Cache miss if p 2/p 3 access X • Valid data from other cache
Distributed Shared Memory (DSM) shared memory network mem 0 mem 1 mem 2 proc 0 proc 1 proc 2 . . . mem. N proc. N
DSM programming • Standard – pthread-like • synchronizations – Barriers – Locks – Semaphores
Sequential SOR for some number of timesteps/iterations { for (i=0; i<n; i++ ) for( j=1, j<n, j++ ) temp[i][j] = 0. 25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i<n; i++ ) for( j=1; j<n; j++ ) grid[i][j] = temp[i][j]; }
Parallel SOR with Barriers (1 of 2) void* sor (void* arg) { int slice = (int)arg; int from = (slice * (n-1))/p + 1; int to = ((slice+1) * (n-1))/p + 1; } for some number of iterations { … }
Parallel SOR with Barriers (2 of 2) for (i=from; i<to; i++) for (j=1; j<n; j++) temp[i][j] = 0. 25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); barrier(); for (i=from; i<to; i++) for (j=1; j<n; j++) grid[i][j]=temp[i][j]; barrier();
Sequential Consistency DSM • As proposed by Li & Hudak, TOCS ‘ 86. • Use virtual memory to implement sharing. • Shared memory divided up by virtual memory pages. • Use an SMP-like coherence protocol. • Keep pages in one of three states: – invalid, read-only, read-write
SC implementation • Synchronous read/write – Writes must be propagated before moving on to the next operation
Read-Write False Sharing x y
Read-Write False Sharing (Cont. ) w(x) r(y) r(x)
Read-Write False Sharing (Cont. ) w(x) r(y) r(x) r(y) synch
Weak Consistency (WEAKC) • Data modifications are only propagated at the time of synchronization. • Works fine if program is properly synchronized through system primitives. – All programs should be …
Read-Write False Sharing (Before) w(x) r(y) r(x) r(y) synch
Read-Write False Sharing (WEAKC) w(x) r(y) r(x) synch
Write-Write False Sharing x y
Write-Write False Sharing w(x) w(y) r(x) synch
Write-Write False Sharing (WEAKC) w(x) w(y) w(x) r(x) w(y) synch
Multiple Writer (MW) Protocols • Allows multiple writers per page. • Modifications merged at synchronization (according to weakc definition). • Modifications are recorded through a mechanism called twinning and diffing.
Write-Write False Sharing and MW w(x) w(y) w(x) r(x) w(y) synch
Creating a diff (delta) Diff (delta) twin w(x). . . w(x) writeprotected writable writeprotected
Write-Write False Sharing and MW x twin w(x) synch w(x) x w(y) r(x) w(y) twin y x y
Release Consistency (RC) • Distinguish acquires from releases – Ordinary read/write wait until the previous acquire is performed – Release waits until previous read/write are performed – Acquire/release are sequentially consistent w. r. t. one another
Eager & Lazy Release Consistency • Eager release consistency: transfer consistency information at release of a lock. • Lazy release consistency: transfer consistency information at acquire of a lock.
Eager Release Consistency p 1 p 2 p 3 p 4 w(x) rel acq w(x) rel Acq w(x) rel acq r(x)
Lazy Release Consistency p 1 p 2 p 3 p 4 w(x) rel acq w(x) rel Acq w(x) rel acq r(x)
Lazy Release Consistency • Acquiring processor determines witch modifications it needs to see. w(x) rel p 1 acq w(y) rel p 2 acq r(x) r(y) p 3 synch
Vector Timestamps 0 p 1 0 0 p 2 0 0 0 p 3 0 0 0 1 w(x) rel 0 0 1 1 acq w(y) rel 0 acq r(x) r(y)
DSM Summary • Relaxed consistency – application’s definition of correctness • >70% performance of corresponding message passing applications
- Slides: 31