Implementation of Locks David Gregg Trinity College Dublin
Implementation of Locks David Gregg Trinity College Dublin CSE 431 L 26 Bus. Multis. 1 Irwin, PSU, 2005
Process Synchronization Need to be able to coordinate processes working on a common task Lock variables (mutexes) are used to coordinate or synchronize processes Need an architecture-supported arbitration mechanism to decide which processor gets access to the lock variable Single bus provides arbitration mechanism, since the bus is the only path to memory – the processor that gets the bus wins Need an architecture-supported operation that locks the variable Locking can be done via an atomic compare and swap operation (processor can both read a location and set it to the locked state – test-and-set – in the same bus operation) CSE 431 L 26 Bus. Multis. 2 Irwin, PSU, 2005
Atomic Compare and Swap Machine instruction used to synchronize threads Threads may be running on different cores Compare and swap acts like a simple C function bool compare_and_swap(int * address, int testval, int newval) { int old_val = *address; if (oldval == testval) { *address = newval; } return old_val; } CSE 431 L 26 Bus. Multis. 3 Irwin, PSU, 2005
Atomic Compare and Swap Compare and swap (CAS) operation completes atomically There is no possibility of the memory location being modified between the CAS reading the location and writing to it Can be used to implement locks E. g, 0 means unlocked, 1 means locked: value = compare_and_swap(lock, 0, 1); if ( value == 0 ) /* we have the lock */ else /* we failed to acquire the lock */ CSE 431 L 26 Bus. Multis. 4 Irwin, PSU, 2005
Atomic Compare and Swap Lock algorithms can be built using compare and swap operations Most locks use compare and swap to attempt to acquire the lock Some specialist lock algorithms use other types of instructions Major difference between different locking algorithms is what to do if you fail to acquire the lock Two main choices Go into a loop and repeatedly try to acquire the lock Known as a spin lock Keeps spinning until another thread releases the lock Put the current thread to sleep and add it to a queue of threads waiting for the lock to be released CSE 431 L 26 Bus. Multis. 5 Irwin, PSU, 2005
Atomic Compare and Swap Spin lock algorithms can be built using compare and swap operations void acquire_lock(int * lock) { do { bool old_val = compare_and_swap(lock, 0, 1); } while (old_val == 1); } void release_lock(int * lock) { fence(); *lock = 0; //maybe we need a memory fence on } CSE 431 L 26 Bus. Multis. 6 // some architectures Irwin, PSU, 2005
Spin Lock Synchronization Read lock variable Spin No Unlocked? (=0? ) Yes Try to lock variable using swap: atomic operation read lock variable and set it to locked value (1) No Succeed? (=0? ) Yes unlock variable: set lock variable to 0 Finish update of shared data. . . Begin update of shared data The single winning processor will read a 0 all others processors will read the 1 set by the winning processor CSE 431 L 26 Bus. Multis. 7 Irwin, PSU, 2005
Memory fences Modern superscalar processors typically execute instructions out of order In a different order to the original program order Loads and stores can be executed out of order Processor may speculate that the load takes data from a different address to the store Processor may actually have load and store addresses, and thus be able to tell which conflict and which are independent CSE 431 L 26 Bus. Multis. 8 Irwin, PSU, 2005
Memory fences Each core typically has a load/store queue Stores are written to the queue, and will be flushed to memory “eventually” A load may take data directly from the load/store queue if there has been a recent store to the location So a load instruction may not ever reach memory If there is already a store to a memory location in the load/store queue, and additional store may overwrite the existing store CSE 431 L 26 Bus. Multis. 9 Irwin, PSU, 2005
Atomic Compare and Swap Spin locks are usually faster than putting a waiting thread to sleep Acquire the lock as soon as it released No need to interact with operating system Works well when lock is seldom contended But wasteful of resources and energy when the lock is contended Thread busy spins doing busy work Large number of memory operations on bus Disaster scenario is spinning while waiting for a thread that is not running CSE 431 L 26 Bus. Multis. 10 Irwin, PSU, 2005
Atomic Compare and Swap Sleeping locks usually slower Need operating system call to put thread to sleep Need to manage queue of waiting threads Every time the lock is released, you must check the list of waiters Sleeping locks make better use of total resources Core is released while thread waits Works well when lock is often contended and there are more threads than cores Especially important in virtualized systems Many lock algorithms are hybrids Spin for a little while to see if lock is release quickly Then go to sleep if it is not Used by Linux futex (fast mutex) library CSE 431 L 26 Bus. Multis. 11 Irwin, PSU, 2005
Locking variants There are many variants of these basic locks Spin locks with “back off” - If the lock is contended the thread waits a while before trying again - Avoids the waiting threads constantly trying to access the cache line containing the lock - Can help where the lock is contended by several threads - Queueing spin locks - Makes the queueing fairer CSE 431 L 26 Bus. Multis. 12 Irwin, PSU, 2005
Atomic Increment Alternative to atomic compare and swap Atomically increments a value in memory Can be used for multithreaded counting without locks Atomic increment locks Can build locks from atomic increment Known as ticket lock Based on ticket system in passport office, GNIB or Argos Queueing spin lock Multiple threads can be waiting for a lock With a regular spin lock, all waiting threads compete for the lock when it becomes free With a queuing spin lock, the waiting threads for an orderly FIFO queue Guarantees fairness CSE 431 L 26 Bus. Multis. 13 Irwin, PSU, 2005
Atomic Increment struct ticket_lock { int dispense; int current_ticket; }; void acquire_lock(struct ticket_lock * lock) { int my_ticket = atomic_increment(&(lock->dispense)); while ( my_ticket != lock->current_ticket); } void unlock(struct ticket_lock * lock) { lock->current_ticket++; } CSE 431 L 26 Bus. Multis. 14 Irwin, PSU, 2005
- Slides: 14