Tread Marks Distributed Shared Memory on Standard Workstations
Tread. Marks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004
Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n
Introduction n Threadmarks is a Distributed Shared Memory system n Unix workstations over an ATM or Ethernet network
Cluster Configuration
Distributed Shared Memory
Motivation n No widely available DSM system n Eliminate problems of other system ¨ Bad portability ¨ Bad performance ¨ False sharing
Goals n Ease of Use n Portability n Good Performance ¨ Also show that it works for real programs
Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n
Ease of Use n Looks a lot like pthreads n Implicit message passing n Implicit process creation
Portability n Only standard Unix System Calls Message Passing ¨ Memory Management ¨
Performance n False sharing n Excessive message passing
Conventional DSM Implementation
Sequential vs Release Consistency n Every Write is broadcasted n More Message Passing n Writes are broadcasted only synchronization points n More Memory overhead
Read-Write False Sharing w(x) r(y) r(x)
Read-Write False Sharing w(x) r(y) r(x) synch
Write-Write False Sharing w(x) w(y) w(x) r(x) w(y) synch
Multiple-Writer False Sharing w(x) w(y) w(x) r(x) w(y) synch
Eager vs. Lazy RC n Sends Messages at release of lock or at barriers n Sends Messages when locks are acquired n Broadcasts Messages to all nodes n Message goes only to the required node
Eager vs. Lazy RC
Memory Consistency n Done by creating diffs Eager RC creates diffs at barriers n Lazy RC creates diffs at the first use of a page n
Twin Creation
Diff Organization
Vector Timestamps 0 p 1 0 0 p 2 0 0 0 p 3 0 0 0 1 w(x) rel 0 0 1 1 acq w(y) rel 0 acq r(x) r(y)
Diff chain in Proc 4
Garbage Collection n Used to merge all diffs – recover memory n Occurs only at barriers n All nodes that have a pages must have all diffs of that page.
Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n
Testing Platform n 8 DECstation-5000/240’s running Ultrix V 4. 3 n Network: ATM 100 Mbps ¨ Ethernet 10 Mbps ¨
Testing Programs Modified Water from Splash n Jacobi n TSP n Quick. Sort n ILINK n
Unix Overhead
Thread. Marks Overhead
Network Comparison - Water
Lazy vs Eager RC
Message Rate
Data Rate
Diff Creation Rate
Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n
Conclusions n Automated Distributed Shared Memory system works for real programs! n LRC improves performance over ERC for most cases
Overview Introduction and Motivation n Implementation n Experiments and Results n Conclusions n My two cents n
My Thoughts n Good design – promotes re-use n Would like to see a comparison over handcoding the message passing n Why not a partial merging of diffs?
Comments/Questions
- Slides: 44