DTHREADS Efficient and Deterministic Multithreading Tongping Liu Charlie











































- Slides: 43

DTHREADS: Efficient and Deterministic Multithreading Tongping Liu, Charlie Curtsinger and Emery D. Berger

outline ★ Introduction ★ Architecture Overview ★ Deterministic Memory Commit ★ Deterministic Synchronization ★ Performance ★ Scalability

introduction

Motivation Concurrent programming is difficult: ● Prone to deadlocks and race conditions. ● Non-deterministic thread interleavings complicates debugging.

Motivation Deterministic Multithreaded System eliminates non-determinism: ● The same program with the same input always yields the same result. ● Simplifies debugging. ● Simplifies multiple replicated execution for fault tolerance.

Contributions DTHREADS: an efficient deterministic runtime system for multithreaded C/C++ applications. ● Guarantees deterministic execution of multithreaded programs. ● A direct replacement for the pthreads library, requiring no code modifications or recompilation. ● Eliminates cache-line based false sharing.

Architecture Overview

Basic Idea ● DTHREADS isolates memory accesses among different threads between commit points, and commits the updates of each thread deterministically. ● Memory accesses are isolated by implementing threads as processes, since processes have separate address spaces. ● DTHREADS uses this isolation to ensure that updates to shared memory are not visible to other threads until a synchronization point is reached.

Isolated Memory Access ● DTHREADS uses memory mapped files to share memory across processes. It has two different mappings for both the globals and the heap: ○ A shared mapping, which is used to hold shared state. ○ A private copy-on-write per-process mapping, that each process works on directly.

Deterministic Memory Commit ● Program execution alternates between parallel and serial phases. ● Memory accesses in the parallel phase work on private copies of memory. ● DTHREADS updates shared state only at synchronization points: ○ thread create and exit ○ mutex lock and unlock ○ condition variable wait and signal ○ barrier wait ● The order in which the updates to shared state are written is enforced by a single global token passed from thread to thread, in deterministic order.


Deterministic Memory Commit ● When a thread commits updates, DTHREADS compares local updates to a copy of the original shared page, and then writes only modified bytes to the shared state.


Deterministic Memory Commit

The Fence The internal fence separates between the parallel and serial phases. The fence has two phases, arrival and departure. ● Threads entering the fence must wait until the arrival phase. ● Then, waiting threads must block until the departure phase. ● The last to enter sets the departure phase and wakes the waiting threads. ● The last thread to leave sets the fence to the arrival phase and wakes any waiting threads.

1 void wait. Fence(void) { 2 lock(); 3 while(!is. Arrival. Phase()) { 4 Cond. Wait(); 5 } 6 waiting_threads++; 7 if(waiting_threads < live_threads) { 8 while(!is. Departure. Phase()) { 9 Cond. Wait(); 10 } 11 } 12 else { 13 set. Departure. Phase(); 14 Cond. Broadcast(); 15 } 16 waiting_threads--; 17 if (waiting_threads == 0) { 18 set. Arrival. Phase(); 19 Cond. Broadcast(); 20 } 21 unlock(); 23 }

The Token The token is a shared pointer that points to the next runnable thread entry. ● wait. Token first waits at the internal fence and then waits to acquire the global token in order to enter serial mode. ● put. Token passes the token to the next waiting thread.

1 void wait. Token() { 2 wait. Fence(); 3 while(token_holder != thread_id) { 4 yield(); 5 } 6 } 7 8 void put. Token() { 9 token_holder = token_queue. next. Thread(); 10 }

atomic. Begin initializes memory mapping before starting an atomic transaction. ● First, all previously-written pages are write-protected. ● Then, The old working copies of these pages are released. ● At last, mappings are updated to reference the shared state.

1 void atomic. Begin() { 2 foreach(page in modified. Pages) { 3 page. write. Protect(); 4 page. private. Copy. free(); 5 } 6 modified. Pages. empty. List() 7 }

atomic. End commits all changes from the current transaction to shared state. ● For each modified page with multiple writers, DTHREADS ensures that a twin page is created. ● If the version of the private copy matches the shared one, the entire private copy is copied to shared state. Otherwise, a diff-based commit is used. ● Then, the number of writers to the page is decremented, and if there are no writers left to commit, the twin page is freed. ● Finally, the shared page’s version number is incremented.

1 void atomic. End() { 2 foreach(page in modified. Pages) { 3 if(page. writers > 1 && !page. has. Twin()) { 4 page. create. Twin(); 5 } 6 if(page. version == page. local. Copy. version) { 7 page. copy. Commit(); 8 } 9 else { 10 page. diff. Commit(); 11 } 12 page. writers--; 13 if(page. writers == 0 && page. has. Twin()) { 14 page. twin. free(); 15 } 16 page. version++; 17 } 18 }

Deterministic Synchronization

mutex_lock ● First, DTHREADS checks to see if the current thread is already holding any locks. ● If not, the thread waits for the token, commits changes to shared state and begins a new atomic section. ● Finally, the thread increments the number of locks it is currently holding.

1 void mutex_lock() { 2 if(lock_count == 0) { 3 wait. Token(); 4 atomic. End(); 5 atomic. Begin(); 6 } 7 lock_count++; 8 }

mutex_unlock ● First, the thread decrements its lock count. ● If there additional locks held, the function returns immediately. ● Otherwise, all local modifications are committed to shared state, the token is passed, and a new atomic section begins. ● Then, the thread waits on the internal fence until the start of the parallel phase.

1 void mutex_unlock(){ 2 lock_count--; 3 if(lock_count == 0) { 4 atomic. End(); 5 put. Token(); 6 atomic. Begin(); 7 wait. Fence(); 8 } 9 }

cond_wait ● The thread acquires the token and commits local modifications. ● The thread removes itself from the token queue and decrements the live threads count. ● It adds itself to the condition variable’s queue, and passes the token. ● Then, the thread is suspended on a pthreads condition variable. ● Once a thread is woken up, it busy-waits on the token and finally begins the next transaction.

1 void cond_wait() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove. Thread(thread_id); 5 live_threads--; 6 cond_queue. add. Thread(thread_id); 7 put. Token(); 8 while(!thread. Ready()) { 9 real_cond_wait(); 10 } 11 while(token_holder != thread_id) { 12 yield(); 13 } 14 atomic. Begin(); 15 }

cond_signal ● The thread waits for the token, and then commits any local modifications. ● If no threads are waiting on the condition variable, the function returns immediately. ● Otherwise, the first thread in the condition variable queue is moved to the head of the token queue and the live thread count is incremented. ● This thread is then marked as ready and woken up from the real condition variable, and the calling thread begins another transaction.

1 void cond_signal() { 2 if(token_holder != thread_id) { 3 wait. Token(); 4 } 5 atomic. End(); 6 if(cond_queue. length == 0) { 7 return; 8 } 9 lock(); 10 thread = cond_queue. remove. Next(); 11 token_queue. insert(thread); 12 live_threads++; 13 thread. set. Ready(true); 14 real_cond_signal(); 15 unlock(); 16 atomic. Begin(); 17 }

barrier_wait ● The thread waits for the token to commit any local modifications. ● If it is the last to enter the barrier, DTHREADS moves the entire list of threads on the barrier queue to the token queue, increases the fence’s thread count, and passes the token to the first thread in the barrier queue. ● Otherwise, DTHREADS removes the current thread from the token queue, places it on the barrier queue, and releases token. ● Finally, the thread waits on the actual barrier.

1 void barrier_wait() { 2 wait. Token(); 3 atomic. End(); 4 lock(); 5 if(barrier_queue. length == barrier_count-1) { 6 token_holder = barrier_queue. first(); 7 live_threads += barrier_queue. length; 8 barrier_queue. move. All. To(token_queue); 9 } 10 else { 11 token_queue. remove(thread_id); 12 barrier_queue. insert(thread_id); 13 put. Token(); 14 } 15 unlock(); 16 atomic. Begin(); 17 real_barrier_wait(); 18 }

thread_create ● The caller first waits for the token before proceeding. ● It then creates a new process with shared file descriptors but a distinct address space using the clone system call. ● The newly created child obtains the global thread index, places itself in the token queue, and notifies the parent that child has registered itself in the active list. ● The child thread then waits for the parent to reach a synchronization point.

1 void thread_create() { 2 wait. Token(); 3 clone(CLONE_FS | CLONE_FILES | CLONE_CHILD ); 4 if(child_process) { 5 thread_id = next_thread_index; 6 next_thread_index++; 7 notify. Child. Registered(); 8 wait. Parent. Proadcast(); 9 } 10 else { 11 wait. Child. Registered(); 12 } 13 }

thread_exit ● The caller thread first waits for the token and then commits any local modifications. ● It then removes itself from the token queue and decreases the number of threads required to proceed to the next phase. ● Finally, the thread passes its token to the next thread in the token queue and exits.

1 void thread_exit() { 2 wait. Token(); 3 atomic. End(); 4 token_queue. remove(thread_id); 5 live_threads--; 6 put. Token(); 7 real_thread_exit(); 8 }

Performance


Scalability



Tongping liu
Tongping liu
Alex liu cecilia liu
Líu líu lo lo ta ca hát say sưa
Productively efficient vs allocatively efficient
Allocative efficiency
Productively efficient vs allocatively efficient
Productively efficient vs allocatively efficient
Productive inefficiency and allocative inefficiency
What is simultaneous multithreading
Multithreaded programming languages
Multitasking in java
Multithreading models
Fine grained multithreading
Fine grained multithreading
Fine-grained multithreading
Fine grained multithreading
Fine grained multithreading
Cmpsc 473
Multithreading design patterns
Multithreading adalah
Unreal engine multi threading
Multitasking vs multithreading in java
What is hardware multithreading
Stochastic inventory model example
Is inventory a stock
Deterministic and stochastic inventory models
Non-deterministic algorithm
Non deterministic algorithm for sorting
Max clique problem
Social learning theory vs operant conditioning
Non-deterministic algorithm
Finite automata
Non deterministic automata
Statistical versus deterministic relationships.
Deterministic games examples
Deterministic seismic hazard analysis
Deterministic pda example
Non-deterministic algorithm
Deterministic operations research
Nondeterministic turing machine
Known vs unknown environment
Deterministic definition
Econometrics chapter 1