Tread Marks Distributed Shared Memory on Standard Workstations

Tread. Marks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004

Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n

Introduction n Threadmarks is a Distributed Shared Memory system n Unix workstations over an ATM or Ethernet network

Cluster Configuration

Distributed Shared Memory

Motivation n No widely available DSM system n Eliminate problems of other system ¨ Bad portability ¨ Bad performance ¨ False sharing

Goals n Ease of Use n Portability n Good Performance ¨ Also show that it works for real programs

Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n

Ease of Use n Looks a lot like pthreads n Implicit message passing n Implicit process creation


Portability n Only standard Unix System Calls Message Passing ¨ Memory Management ¨

Performance n False sharing n Excessive message passing

Conventional DSM Implementation

Sequential vs Release Consistency n Every Write is broadcasted n More Message Passing n Writes are broadcasted only synchronization points n More Memory overhead

Read-Write False Sharing w(x) r(y) r(x)

Read-Write False Sharing w(x) r(y) r(x) synch

Write-Write False Sharing w(x) w(y) w(x) r(x) w(y) synch

Multiple-Writer False Sharing w(x) w(y) w(x) r(x) w(y) synch

Eager vs. Lazy RC n Sends Messages at release of lock or at barriers n Sends Messages when locks are acquired n Broadcasts Messages to all nodes n Message goes only to the required node

Eager vs. Lazy RC

Memory Consistency n Done by creating diffs Eager RC creates diffs at barriers n Lazy RC creates diffs at the first use of a page n

Twin Creation

Diff Organization

Vector Timestamps 0 p 1 0 0 p 2 0 0 0 p 3 0 0 0 1 w(x) rel 0 0 1 1 acq w(y) rel 0 acq r(x) r(y)

Diff chain in Proc 4

Garbage Collection n Used to merge all diffs – recover memory n Occurs only at barriers n All nodes that have a pages must have all diffs of that page.

Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n

Testing Platform n 8 DECstation-5000/240’s running Ultrix V 4. 3 n Network: ATM 100 Mbps ¨ Ethernet 10 Mbps ¨

Testing Programs Modified Water from Splash n Jacobi n TSP n Quick. Sort n ILINK n




Unix Overhead

Thread. Marks Overhead

Network Comparison - Water

Lazy vs Eager RC

Message Rate

Data Rate

Diff Creation Rate

Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n

Conclusions n Automated Distributed Shared Memory system works for real programs! n LRC improves performance over ERC for most cases

Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n

My Thoughts n Good design – promotes re-use n Would like to see a comparison over handcoding the message passing n Why not a partial merging of diffs?

Comments/Questions
- Slides: 44