DTHREADS Efficient and Deterministic Multithreading Tongping Liu Charlie

  • Slides: 43
Download presentation
DTHREADS: Efficient and Deterministic Multithreading Tongping Liu, Charlie Curtsinger and Emery D. Berger

DTHREADS: Efficient and Deterministic Multithreading Tongping Liu, Charlie Curtsinger and Emery D. Berger

outline ★ Introduction ★ Architecture Overview ★ Deterministic Memory Commit ★ Deterministic Synchronization ★

outline ★ Introduction ★ Architecture Overview ★ Deterministic Memory Commit ★ Deterministic Synchronization ★ Performance ★ Scalability

introduction

introduction

Motivation Concurrent programming is difficult: ● Prone to deadlocks and race conditions. ● Non-deterministic

Motivation Concurrent programming is difficult: ● Prone to deadlocks and race conditions. ● Non-deterministic thread interleavings complicates debugging.

Motivation Deterministic Multithreaded System eliminates non-determinism: ● The same program with the same input

Motivation Deterministic Multithreaded System eliminates non-determinism: ● The same program with the same input always yields the same result. ● Simplifies debugging. ● Simplifies multiple replicated execution for fault tolerance.

Contributions DTHREADS: an efficient deterministic runtime system for multithreaded C/C++ applications. ● Guarantees deterministic

Contributions DTHREADS: an efficient deterministic runtime system for multithreaded C/C++ applications. ● Guarantees deterministic execution of multithreaded programs. ● A direct replacement for the pthreads library, requiring no code modifications or recompilation. ● Eliminates cache-line based false sharing.

Architecture Overview

Architecture Overview

Basic Idea ● DTHREADS isolates memory accesses among different threads between commit points, and

Basic Idea ● DTHREADS isolates memory accesses among different threads between commit points, and commits the updates of each thread deterministically. ● Memory accesses are isolated by implementing threads as processes, since processes have separate address spaces. ● DTHREADS uses this isolation to ensure that updates to shared memory are not visible to other threads until a synchronization point is reached.

Isolated Memory Access ● DTHREADS uses memory mapped files to share memory across processes.

Isolated Memory Access ● DTHREADS uses memory mapped files to share memory across processes. It has two different mappings for both the globals and the heap: ○ A shared mapping, which is used to hold shared state. ○ A private copy-on-write per-process mapping, that each process works on directly.

Deterministic Memory Commit ● Program execution alternates between parallel and serial phases. ● Memory

Deterministic Memory Commit ● Program execution alternates between parallel and serial phases. ● Memory accesses in the parallel phase work on private copies of memory. ● DTHREADS updates shared state only at synchronization points: ○ thread create and exit ○ mutex lock and unlock ○ condition variable wait and signal ○ barrier wait ● The order in which the updates to shared state are written is enforced by a single global token passed from thread to thread, in deterministic order.

Deterministic Memory Commit ● When a thread commits updates, DTHREADS compares local updates to

Deterministic Memory Commit ● When a thread commits updates, DTHREADS compares local updates to a copy of the original shared page, and then writes only modified bytes to the shared state.

Deterministic Memory Commit

Deterministic Memory Commit

The Fence The internal fence separates between the parallel and serial phases. The fence

The Fence The internal fence separates between the parallel and serial phases. The fence has two phases, arrival and departure. ● Threads entering the fence must wait until the arrival phase. ● Then, waiting threads must block until the departure phase. ● The last to enter sets the departure phase and wakes the waiting threads. ● The last thread to leave sets the fence to the arrival phase and wakes any waiting threads.

1 void wait. Fence(void) { 2 lock(); 3 while(!is. Arrival. Phase()) { 4 Cond.

1 void wait. Fence(void) { 2 lock(); 3 while(!is. Arrival. Phase()) { 4 Cond. Wait(); 5 } 6 waiting_threads++; 7 if(waiting_threads < live_threads) { 8 while(!is. Departure. Phase()) { 9 Cond. Wait(); 10 } 11 } 12 else { 13 set. Departure. Phase(); 14 Cond. Broadcast(); 15 } 16 waiting_threads--; 17 if (waiting_threads == 0) { 18 set. Arrival. Phase(); 19 Cond. Broadcast(); 20 } 21 unlock(); 23 }

The Token The token is a shared pointer that points to the next runnable

The Token The token is a shared pointer that points to the next runnable thread entry. ● wait. Token first waits at the internal fence and then waits to acquire the global token in order to enter serial mode. ● put. Token passes the token to the next waiting thread.

1 void wait. Token() { 2 wait. Fence(); 3 while(token_holder != thread_id) { 4

1 void wait. Token() { 2 wait. Fence(); 3 while(token_holder != thread_id) { 4 yield(); 5 } 6 } 7 8 void put. Token() { 9 token_holder = token_queue. next. Thread(); 10 }

atomic. Begin initializes memory mapping before starting an atomic transaction. ● First, all previously-written

atomic. Begin initializes memory mapping before starting an atomic transaction. ● First, all previously-written pages are write-protected. ● Then, The old working copies of these pages are released. ● At last, mappings are updated to reference the shared state.

1 void atomic. Begin() { 2 foreach(page in modified. Pages) { 3 page. write.

1 void atomic. Begin() { 2 foreach(page in modified. Pages) { 3 page. write. Protect(); 4 page. private. Copy. free(); 5 } 6 modified. Pages. empty. List() 7 }

atomic. End commits all changes from the current transaction to shared state. ● For

atomic. End commits all changes from the current transaction to shared state. ● For each modified page with multiple writers, DTHREADS ensures that a twin page is created. ● If the version of the private copy matches the shared one, the entire private copy is copied to shared state. Otherwise, a diff-based commit is used. ● Then, the number of writers to the page is decremented, and if there are no writers left to commit, the twin page is freed. ● Finally, the shared page’s version number is incremented.

1 void atomic. End() { 2 foreach(page in modified. Pages) { 3 if(page. writers

1 void atomic. End() { 2 foreach(page in modified. Pages) { 3 if(page. writers > 1 && !page. has. Twin()) { 4 page. create. Twin(); 5 } 6 if(page. version == page. local. Copy. version) { 7 page. copy. Commit(); 8 } 9 else { 10 page. diff. Commit(); 11 } 12 page. writers--; 13 if(page. writers == 0 && page. has. Twin()) { 14 page. twin. free(); 15 } 16 page. version++; 17 } 18 }

Deterministic Synchronization

Deterministic Synchronization

mutex_lock ● First, DTHREADS checks to see if the current thread is already holding

mutex_lock ● First, DTHREADS checks to see if the current thread is already holding any locks. ● If not, the thread waits for the token, commits changes to shared state and begins a new atomic section. ● Finally, the thread increments the number of locks it is currently holding.

1 void mutex_lock() { 2 if(lock_count == 0) { 3 wait. Token(); 4 atomic.

1 void mutex_lock() { 2 if(lock_count == 0) { 3 wait. Token(); 4 atomic. End(); 5 atomic. Begin(); 6 } 7 lock_count++; 8 }

mutex_unlock ● First, the thread decrements its lock count. ● If there additional locks

mutex_unlock ● First, the thread decrements its lock count. ● If there additional locks held, the function returns immediately. ● Otherwise, all local modifications are committed to shared state, the token is passed, and a new atomic section begins. ● Then, the thread waits on the internal fence until the start of the parallel phase.

1 void mutex_unlock(){ 2 lock_count--; 3 if(lock_count == 0) { 4 atomic. End(); 5

1 void mutex_unlock(){ 2 lock_count--; 3 if(lock_count == 0) { 4 atomic. End(); 5 put. Token(); 6 atomic. Begin(); 7 wait. Fence(); 8 } 9 }

cond_wait ● The thread acquires the token and commits local modifications. ● The thread

cond_wait ● The thread acquires the token and commits local modifications. ● The thread removes itself from the token queue and decrements the live threads count. ● It adds itself to the condition variable’s queue, and passes the token. ● Then, the thread is suspended on a pthreads condition variable. ● Once a thread is woken up, it busy-waits on the token and finally begins the next transaction.

1 void cond_wait() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove. Thread(thread_id);

1 void cond_wait() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove. Thread(thread_id); 5 live_threads--; 6 cond_queue. add. Thread(thread_id); 7 put. Token(); 8 while(!thread. Ready()) { 9 real_cond_wait(); 10 } 11 while(token_holder != thread_id) { 12 yield(); 13 } 14 atomic. Begin(); 15 }

cond_signal ● The thread waits for the token, and then commits any local modifications.

cond_signal ● The thread waits for the token, and then commits any local modifications. ● If no threads are waiting on the condition variable, the function returns immediately. ● Otherwise, the first thread in the condition variable queue is moved to the head of the token queue and the live thread count is incremented. ● This thread is then marked as ready and woken up from the real condition variable, and the calling thread begins another transaction.

1 void cond_signal() { 2 if(token_holder != thread_id) { 3 wait. Token(); 4 }

1 void cond_signal() { 2 if(token_holder != thread_id) { 3 wait. Token(); 4 } 5 atomic. End(); 6 if(cond_queue. length == 0) { 7 return; 8 } 9 lock(); 10 thread = cond_queue. remove. Next(); 11 token_queue. insert(thread); 12 live_threads++; 13 thread. set. Ready(true); 14 real_cond_signal(); 15 unlock(); 16 atomic. Begin(); 17 }

barrier_wait ● The thread waits for the token to commit any local modifications. ●

barrier_wait ● The thread waits for the token to commit any local modifications. ● If it is the last to enter the barrier, DTHREADS moves the entire list of threads on the barrier queue to the token queue, increases the fence’s thread count, and passes the token to the first thread in the barrier queue. ● Otherwise, DTHREADS removes the current thread from the token queue, places it on the barrier queue, and releases token. ● Finally, the thread waits on the actual barrier.

1 void barrier_wait() { 2 wait. Token(); 3 atomic. End(); 4 lock(); 5 if(barrier_queue.

1 void barrier_wait() { 2 wait. Token(); 3 atomic. End(); 4 lock(); 5 if(barrier_queue. length == barrier_count-1) { 6 token_holder = barrier_queue. first(); 7 live_threads += barrier_queue. length; 8 barrier_queue. move. All. To(token_queue); 9 } 10 else { 11 token_queue. remove(thread_id); 12 barrier_queue. insert(thread_id); 13 put. Token(); 14 } 15 unlock(); 16 atomic. Begin(); 17 real_barrier_wait(); 18 }

thread_create ● The caller first waits for the token before proceeding. ● It then

thread_create ● The caller first waits for the token before proceeding. ● It then creates a new process with shared file descriptors but a distinct address space using the clone system call. ● The newly created child obtains the global thread index, places itself in the token queue, and notifies the parent that child has registered itself in the active list. ● The child thread then waits for the parent to reach a synchronization point.

1 void thread_create() { 2 wait. Token(); 3 clone(CLONE_FS | CLONE_FILES | CLONE_CHILD );

1 void thread_create() { 2 wait. Token(); 3 clone(CLONE_FS | CLONE_FILES | CLONE_CHILD ); 4 if(child_process) { 5 thread_id = next_thread_index; 6 next_thread_index++; 7 notify. Child. Registered(); 8 wait. Parent. Proadcast(); 9 } 10 else { 11 wait. Child. Registered(); 12 } 13 }

thread_exit ● The caller thread first waits for the token and then commits any

thread_exit ● The caller thread first waits for the token and then commits any local modifications. ● It then removes itself from the token queue and decreases the number of threads required to proceed to the next phase. ● Finally, the thread passes its token to the next thread in the token queue and exits.

1 void thread_exit() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove(thread_id); 5

1 void thread_exit() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove(thread_id); 5 live_threads--; 6 put. Token(); 7 real_thread_exit(); 8 }

Performance

Performance

Scalability

Scalability