Transactional Memory James Larus and Christos Kozyrakis MOTIVATION

  • Slides: 29
Download presentation
Transactional Memory James Larus and Christos Kozyrakis

Transactional Memory James Larus and Christos Kozyrakis

MOTIVATION �Transition from sequential computing to parallel computing �Achieving optimal performance from Multicore computers

MOTIVATION �Transition from sequential computing to parallel computing �Achieving optimal performance from Multicore computers based on improving parallelism in programming. �Find better abstractions for expressing parallel computation and for writing parallel programs �Current Programming Constructs. �Threads, Locks, Semaphores etc

TRANSACTIONAL MEMORY �A transaction is a form of program execution. �In case of parallel

TRANSACTIONAL MEMORY �A transaction is a form of program execution. �In case of parallel programming, TM offers a mechanism that allows portions of a program to execute in isolation, without regard to other, concurrently executing tasks. �TM provides lightweight transactions for threads running in a shared address space. �TM ensures the atomicity and isolation of concurrently executing tasks. �TM provides a basis to built parallel abstractions

TRANSACTIONAL MEMORY �Atomicity ensures program state changes effected by code executing in a transaction

TRANSACTIONAL MEMORY �Atomicity ensures program state changes effected by code executing in a transaction are indivisible from the perspective of other, concurrently executing. �Isolation ensures that concurrently executing tasks cannot affect the result of a transaction, so a transaction produces the same answer as when no other task was executing.

PROGRAM MODEL �General TM Systems �Provide simple atomic statements that execute a block of

PROGRAM MODEL �General TM Systems �Provide simple atomic statements that execute a block of code (and the routines it invokes) as a transaction. �Not a replacement for general synchronization such as semaphores or condition variables. �AME �Executing most of a program in transactions �Supports asynchronous programming

ADVANTAGES �TM offers a simpler alternative to mutual exclusion by shifting the burden of

ADVANTAGES �TM offers a simpler alternative to mutual exclusion by shifting the burden of correct synchronization from a programmer to the TM system. �Program’s author only needs to identify a sequence of operations on shared data that should appear to execute atomically to other, concurrent thread. �Transactions make synchronization composable, which enables the construction of concurrency programming abstractions.

LIMITATIONS � Transactions by themselves cannot replace all synchronization in a parallel program �

LIMITATIONS � Transactions by themselves cannot replace all synchronization in a parallel program � Synchronization is often used to coordinate independent tasks � Consider, a producer-consumer programming relationship. � Transactions can ensure the tasks’ shared accesses do not interfere � If the consumer transaction finds the value is not available, it can only abort and check for the value later. � TM systems provide a guard that prevents a transaction from starting execution until a predicate becomes true. � Retry and or. Else constructs by Haskell TM � The trade-offs and programming pragmatics of the TM programming model are still not understood. � The performance of TM is not yet good enough for widespread use. � Software TM systems (STM) impose considerable overhead costs on code running in a transaction � HTM fall back on software for large transactions

TRANSACTIONAL MEMORY IMPLEMENTATION �STM (Software Transactional Memory �HTM (Hardware Transactional Memory) �Most TM systems

TRANSACTIONAL MEMORY IMPLEMENTATION �STM (Software Transactional Memory �HTM (Hardware Transactional Memory) �Most TM systems of both types implement optimistic concurrency control. �The alternative pessimistic concurrency control requires a transaction to establish exclusive access to a location.

STM � Implemented lock-free, atomic, multi-location operations entirely in software �Required a program to

STM � Implemented lock-free, atomic, multi-location operations entirely in software �Required a program to declare in advance the memory locations to be accessed by a transaction

STM � DSTM �Object-granularity, deferred-update STM system �Conflict Detection � Early Detection � Late

STM � DSTM �Object-granularity, deferred-update STM system �Conflict Detection � Early Detection � Late Detection �Read- Write Conflicts � Only clone objects that are modified. � Read-Object List �Conditions for Commit � No concurrently executing transaction modified an object read by T � Transaction T is not modifying an object that another transaction is also modifying. �Performance of DSTM dependent on workload

STM �Deferred Update Systems �WSTM system detects conflicts at word, not object, granularity Direct

STM �Deferred Update Systems �WSTM system detects conflicts at word, not object, granularity Direct update Systems �Avoid unnecessary conflicts if two transactions access different fields in an object �Extended Java with an atomic statement that executed its block in a transaction �Policy to select which transaction to abort in case of conflict. �“Polka Policy” – Track no. of objects it has open and uses them as priority.

STM �Direct Update Systems �Transactions directly modify an object, rather than a copy. �Must

STM �Direct Update Systems �Transactions directly modify an object, rather than a copy. �Must record the original value of each modified memory location. �Must prevent a transaction from reading the locations modified by other, uncommitted transactions, thereby reducing the potential for concurrent execution �Require a lock to prevent multiple transactions from updating an object concurrently. �Direct-update STM systems provide forward progress guarantees to an application by detecting and aborting failed or blocked threads.

HTM �Hardware Acceleration for STM �The primary source of overhead for an STM is

HTM �Hardware Acceleration for STM �The primary source of overhead for an STM is the maintenance and validation of read sets �Invokes instrumentation routine �HASTM first proposed by Saha et al. �Provides the STM with two capabilities through perthread mark bits at the granularity of cache blocks �Software can check if a mark bit was previously set for a given block of memory and that no other thread wrote to the block since it was marked. �Software can query if potentially there writes by other threads to any of the memory blocks that the thread marked.

HTM �HASTM �Implements mark bits using additional metadata for each block in the per-processor

HTM �HASTM �Implements mark bits using additional metadata for each block in the per-processor cache of a Multicore chip �The read instrumentation call checks and sets the mark bit for the memory block that contains an object’s header �If the mark bit was set, indicating that the transaction previously accessed the object, it is not added to the read set again �Validation �Relies on software based validation if checked. �In HASTM, the mark bits may be lost if a processor is used to run other tasks

HTM � Sig. TM �Uses hardware signatures to encode the read set and write

HTM � Sig. TM �Uses hardware signatures to encode the read set and write set for software transactions �A hardware Bloom filter outside of the caches computes the signatures �Software instrumentation provides the filters with the addresses of the object �Hardware in the computer monitors coherence traffic for requests for exclusive accesses to a cache block, which indicates a memory update �The hardware tests if the address in a request is potentially in a transaction’s read or write set by examining the transaction’s signatures. �Either aborts or falls back on SW validation. �Capacity and conflict misses do not cause software validation

HTM �HTM systems require no software instrumentation of memory references within transaction code. �Manages

HTM �HTM systems require no software instrumentation of memory references within transaction code. �Manages data versions and tracks conflicts transparently as software performs ordinary read and write accesses �Rely on a computer’s cache hierarchy and the cache coherence protocol to implement versioning and conflict detection

HTM �Transactional Coherence and Consistency (TCC) �Deferred update HTM that performs conflict detection when

HTM �Transactional Coherence and Consistency (TCC) �Deferred update HTM that performs conflict detection when a transaction attempts to commit. �Each cache block is annotated with R and W tracking bits � Cache blocks in the write set act as a write buffer and do not propagate the memory updates until the transaction commits. �Two-phase protocol.

HTM �Hardware acquires exclusive access to all cache blocks in the write set using

HTM �Hardware acquires exclusive access to all cache blocks in the write set using coherence messages �The hardware instantaneously resets all W bits in the cache, which atomically commits the updates by this transaction �If validation fails, hardware reverts to a software handler �Conflict Detection

HTM �Advantages & Limitations �An HTM system can outperform a lockbased STM by a

HTM �Advantages & Limitations �An HTM system can outperform a lockbased STM by a factor of four and the corresponding hardwareaccelerated STM by a factor of two �The caches used to track the read set, write set, and data versions have finite capacity and may overflow on a long transaction �The transactional state in caches is large and is difficult to save and restore �Placing implementation-dependent limits on transaction sizes is unacceptable from a programmer’s perspective.

SOLUTIONS � Offending transaction executes to completion �HTM system can update memory directly without

SOLUTIONS � Offending transaction executes to completion �HTM system can update memory directly without tracking the read set, write set, or old data. �However, no other transactions can execute � Virtualized TM (VTM) �Maps the key bookkeeping data structures for transactional execution (read set, write buffer or undo- log) to virtual memory �Hardware caches hold the working set of these data structures � Hybrid HTM–STM system �transaction starts in the HTM mode �restarted in the STM mode with additional instrumentation if resources exceeded �Provides good performance for short transactions.

HARDWARE/SOFTWARE INTERFACE FOR TRANSACTIONAL MEMORY �Four interface mechanisms for HTM systems �The first mechanism

HARDWARE/SOFTWARE INTERFACE FOR TRANSACTIONAL MEMORY �Four interface mechanisms for HTM systems �The first mechanism is a two-phase commit protocol that architecturally separates transaction validation from committing its updates to memory �The second mechanism is transactional handlers that allow software to interface on significant events �The third mechanism is support for closed and opennested transactions �Fourth, multiple types of load and store instructions what allow compilers to distinguish accesses to thread private, immutable, or idempotent data from accesses to truly shared data

Open Issues �Transaction that executed an I/O operation may roll back at a conflict.

Open Issues �Transaction that executed an I/O operation may roll back at a conflict. �Strong and weak atomicity. �STM systems generally implement weak atomicity, in which non-transactional code is not isolated from code in transactions �HTM systems, on the other hand, implement strong atomicity �TM must coexist and interoperate with existing programs and libraries

CONCLUSION �TM provide a time tested model for isolating concurrent computations from each other

CONCLUSION �TM provide a time tested model for isolating concurrent computations from each other �Raises the level of abstraction for reasoning about concurrent tasks