CASHMERe David Kunzman Mei Chao Esteban Pauli Overview
CASHMERe David Kunzman, Mei Chao, Esteban Pauli
Overview “It is well accepted today that commercial workstations offer the best price/performance ratio and that shared memory provides the most desirable programming paradigm for parallel computing. ” l CASHMERe provides a shared memory view of distributed memory machines. l Provide two DSM coherence protocols l Over message passing l With DEC Memory Channel NIC l
Data Sharing Management Page-sized coherence blocks for shared-memory l Accesses detected by VM subsystem l Each page l A single distinguished home node: master copy l An entry in a global page directory: maintains info about sharing state and home node location l States: Invalid, Read or Read-Write l
Handling Page Faults l Invalid page: copy from home node via page update request l Read access: upgrade sharing state for the node and set permission to Read l Write access l Exclusive mode if no other shares l Make a pristine copy (Twin), page ID into dirty list (Enable recovery later); Update sharing info and set permission to Read-Write
During Release/Acquire Operation l Release: l Diff: traverse processor’s dirty list and compare the working copy of each modified page to its twin l After diff, write notices are sent to all sharers of the page l Acquire: l Invalidate all pages with write notice
Programming in CASHMERe SPMD l Primitives: l Locks: acquire, release l Barrier l Flags l acquire, release l More advanced functionality l void csm_wait_eq_flag(int index, long value); l long csm_poll_flag(int index) l l http: //www. cs. rochester. edu/research/cashmere /docs/csm_api. html
Example: Initialization 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. struct { int values[4096]; } *p. Global. Data 1; struct { int others[256]; } *p. Global. Data 2; void main(int arg, char **argv) { csm_init_memory_size( csm_memory_page_round(sizeof(*p. Global. Data 1)) + csm_memory_page_round(sizeof(*p. Global. Data 2)) ); csm_init_start(&argc, &argv); // spawn other threads if (csm_pid == 0) { p. Global. Data 1 = csm_malloc(sizeof(*p. Global. Data 1)); csm_distribute(&p. Global. Data 1, sizeof(p. Global. Data 1)); p. Global. Data 2 = csm_malloc(sizeof(*p. Global. Data 2)); csm_distribute(&p. Global. Data 2, sizeof(p. Global. Data 2)); } csm_init_complete(); // before this, only pe 0 can touch global data // compute using global variable. . . p. Global. Data 1 ->values[0] += p. Global. Data 2 ->others[csm_pid]; csm_exit(0); }
Example: 1 D Jacobi 1. 2. 3. 4. 5. 6. 7. 8. 9. while(not converged){ my_start = csm_pid * chunk_size; my_end = (csm_pid + 1) * chunk_size; for(i = my_start; i < my_end; i++) old[i] = cur[i]; // copy for(i = my_start; i < my_end; i++) cur[i] = … // compute csm_barrier(0 /* barrier ID */); }
Example: Dining Philosophers 7. while(still_hungry_for_pizza()){ csm_lock_acquire(csm_pid); csm_lock_acquire((csm_pid + 1) % csm_num_pid); eat_pizza(); // yummy csm_lock_release(csm_pid); csm_lock_release((csm_pid + 1) % csm_num_pid); } 8. // can deadlock, just showing use of locks 1. 2. 3. 4. 5. 6.
Example: flag use // wait my turn 2. csm_wait_neq_flag(0, csm_pid); 3. csm_acquire_flag(0); 1. 4. … // take my turn //let next person take their turn 6. cms_inc_flag(0, 1); 5.
Example: Bubble Sort 1. 2. 3. 4. 5. 6. 7. 8. 9. // Assume flag 0 is initialized to 0 for (int i = 0; i < N; i++) { csm_wait_neq_flag(0, csm_pid % 2); … // swap with neighbor if needed if (csm_pid == 0 || csm_pid == 1) csm_aquire_flag(0); if (csm_pid == 0) csm_inc_flag(0, 1); if (csm_pid == 1) csm_reset_flag(0); }
Current Status l Could not find a place to download it l Last paper in 1999
- Slides: 12