Lecture 20 Beyond Lowlevel Parallelism Memory Consistency S
Lecture 20 Beyond Low-level Parallelism: Memory Consistency © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois
Systems with more than one processor: issues thus far • For performance: – – – • computation and communication Amdahl’s law costs of synchronization For correct operation: – – hardware: cache coherence software: synchronization hw and sw: memory consistency all three affect performance, too © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 2
Example of an implicitly synchronized program Processor 0 Processor 1 A = 1; If (B == 0) { Critical sec. A = 0; } B = 1; If (A == 0){ Critical Sec. B = 0; } • If Processor 0 reorders the write to A and the read from B, both processors can end up in the critical section at the same time. • There is no local basis to know that write to A and read from B must be strictly ordered due to the needs of multiprocessor execution © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 3
Sequential Consistency • From the point of view of a single processor, every memory operation executes in program order. • For any given memory op A (read or write), all previous memory ops execute before A and all subsequent ops execute after A. Mem. Op A acts a memory fence. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 4
Sequential Consistency • Read followed by a read: R R • Read followed by a write: R W • Write followed by a write: W W • Write followed by a read: W R © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 5
Processor Consistency or Total Store Order • R R, R W, W W • but need S W, S R, R S, W S, S S • Notices W R is missing, allows write buffering • What is S? Synchronization instruction, such as Alpha MB (memory barrier). • Used on DEC VAX, IBM 370 © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 6
Partial Store Order • R R, R W • Writes can go out-of-order (W W and W R). This allows for sophisticated write buffering. • Again, synchronization operations enable stronger ordering: S W, S R, R S, W S, S S • Supported by SPARC © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 7
Weak Consistency (or Weak Ordering) • All memory operations can proceed out-of-order. • Based on weak consistency model defined by Dubois, et al by saying that it has three properties: – Accesses to synchronization variables are sequentially consistent. – No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere. – No data access ( read or write ) is allowed to be performed until all previous accesses to synchronization variables have been performed. • Synchronization must be explicitly used to guarantee an order: S W, S R, R S, W S, S S • Supported by Power. PC, Alpha © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 8
Release Consistency • Weak consistency has the problem that when a synchronization variable is accessed, the memory does not know whether this is being done because the process is finished writing the shared variables or about to start reading them. – Consequently, it must take the actions required in both cases, namely making sure that all locally initiated writes have been completed , as well as gathering in all writes from other machines. • If the memory could tell the difference between entering a critical region and leaving one, a more efficient implementation might be possible. To provide this information, two kinds of synchronization variables or operations are needed instead of one. – Acquire accesses are used to tell the memory system that a critical region is about to be entered. – Release accesses say that a critical region has just been exited. – These accesses can be implemented either as ordinary operations on special variables or as special operations. © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 9
Motivating release consistency acquire(lock); { All write values from other nodes must be gathered; local writes and reads do not have to be complete before acquire starts Critical section for shared variable access (R/W) } release(lock); © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois All write values from local CPU must be flushed out; local reads can proceed before synchronization completes 10
Release Consistency • Memory operations are unordered • Two synchronization instructions provided SA (acquire) and SR (release) • SA W, SA R, R SR, W SR, SX SX • Supported by MIPS, IA-64 © S. J. Patel, 2001 ECE 412, Fall 2001, University of Illinois 11
- Slides: 11