Distributed Shared Memory A Survey of Issues and
Distributed Shared Memory: A Survey of Issues and Algorithms B, . Nitzberg and V. Lo University of Oregon
INTRODUCTION • Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space
Why bother with DSM? • Key idea is to build fast parallel computers that are – Cheaper than shared memory multiprocessor architectures – As convenient to use
Conventional parallel architecture CPU CPU CACH E Shared memory
Today’s architecture • Clusters of workstations are much more cost effective – No need to develop complex bus and cache structures – Can use off-the-shelf networking hardware • Gigabit Ethernet • Myrinet (1. 5 Gb/s) – Can quickly integrate newest microprocessors
Limitations of cluster approach • Communication within a cluster of workstation is through message passing – Much harder to program than concurrent access to a shared memory • Many big programs were written for shared memory architectures – Converting them to a message
Distributed shared memory main memories DSM = one shared global address space
Distributed shared memory • DSM makes a cluster of workstations look like a shared memory parallel computer – Easier to write new programs – Easier to port existing programs • Key problem is that DSM only provides the illusion of having a shared memory architecture – Data must still move back and forth
Basic approaches • Hardware implementations: – Use extensions of traditional hardware caching architecture • Operating system/library implementations: – Use virtual memory mechanisms • Compiler implementations – Compiler handles all shared accesses
Design Issues (I) 1. Structure and granularity – Big units are more efficient • Virtual memory pages – Can have false sharing whenever page contains different variables that are accessed at the same time by different processors
False Sharing accesses x x accesses y y page containing x and y will move back and fo between main memories of workstations
Design Issues (II) 1. Structure and granularity (cont'd) – Shared objects can also be • Objects from a distributed object-oriented system • Data types from an extant language
Design Issues (III) 2. Coherence semantics – Strict consistency is not possible – Various authors have proposed weaker consistency models • Cheaper to implement • Harder to use in a correct fashion
Design Issues (IV) 3. Scalability – Possibly very high but limited by • Central bottlenecks • Global knowledge operation and storage
Design Issues (V) 4. Heterogeneity – Possible but complex to implement
Portability Issues • • Not in pape Portability of programs – Some DSMs allow programs written for a multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks) – More efficient DSMs require more changes Portability of DSM
Implementation Issues (I) 1. Data Location and Access: • Keep data a single centralized location • Let data migrate (better) but must have way to locate them • Centralized server (bottleneck) • Have a "home" node associated with each piece of data
Implementation Issues (II) 1. Data Location and Access (cont'd): • Can either • Maintain a single copy of each piece of data • Replicate it on demand • Must either • Propagate updates to all replicas • Use an invalidation protocol
Invalidation protocol • Before update: X=0 X=0 INVALID X=0 • At update time X=5
Main advantage • Locality of updates: – A page that is being modified has a high likelihood of being modified again • Invalidation mechanism minimizes consistency overhead – One single invalidation replaces many updates
A realization: Munin • Developed at Rice University • Based on software objects (variables) • Used the processor virtual memory to detect access to the shared objects • Included several techniques for reducing consistency-related communication • Only ran on top of the V kernel
Munin main strengths • • Excellent performance Portability of programs – Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes (dusty decks)
Munin main weakness • Very poor portability of Munin itself – Depended of some features of the V kernel • Not maintained since the late 80's
Consistency model • Munin uses software release consistency – Only requires the memory to be consistent at specific synchronization points
SW release consistency (I) • Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables – P(&mutex) and V(&mutex) – lock(&csect) and unlock(&csect) – acquire( ) and release( ) • Unprotected accesses can produce unpredictable results
SW release consistency (II) • SW release consistency will only guarantee correctness of operations performed within a request/release pair • No need to export the new values of shared variables until the release • Must guarantee that workstation has received the most recent values of all shared variables when it completes a request
SW release consistency (III) shared int x; acquire( ); x = 1; release ( ); // export x=1 shared int x; acquire( ); // wait for new value of x x++; release ( ); // export x=2
SW release consistency (IV) • Must still decide how to release updated values – Munin uses eager release: • New values of shared variables were propagated at release time
SW release consistency (V) Eager release Each release forwards the update to the two other processors.
Multiple write protocol • Designed to fight false sharing • Uses a copy-on-write mechanism • Whenever a process is granted access to write-shared data, the page containing these data is marked copyon-write • First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).
Creating a twin Not in paper
Example Not in paper Before x= 1 y= After 2 x= 3 y= First write access x= 1 y= twin 2 Compare with twin New value of x is 3
Other DSM Implementations (I) • Software release consistency with lazy release (Treadmarks) – Faster and designed to be portable • Sequentially-Consistent Software DSM (IVY): – Sends messages to other copies at each write – Much slower
Other DSM Implementations (II) • Entry consistency (Midway): – Requires each variable to be associated to a synchronization object (typically a lock) – Acquire/release operations on a given synchronization object only involve the variables associated with that object – Requires less data traffic
Other DSM Implementations (III) • Structured DSM Systems (Linda): – Offer to the programmer a shared tuple space accessed using specific synchronized methods – Require a very different programming style
TODAY'S IMPACT • Very low: – According to W. Zwaepoel. truth is that computer clusters are "only suitable for coarse-grained parallel computation" and this is "[a] fortiori true for DSM" – DSM competed with Open. MP model and OPen. MP model won
- Slides: 36