Distributed Systems Lecture 10 Consistency and Replication Nam

Distributed Systems Lecture 10 –Consistency and Replication Nam, Beomseok Spring 2015

Reasons for Replication § Availability § Fault tolerance § Performance Ølocal access is faster with reduced communication delays Øconcurrent requests can be served by n>1 servers § Scalability Øprevents overloading a single server (size scalability) Øavoids communication latencies (geographic scalability)

Downside to Replication § There’s a price to pay for this extra reliability and performance § Every time a piece of data is changed on one replica, it has to be changed on all of them to maintain consistency ØExample: Web caching - how many times have you gotten an outdated version of a web page because it’s sitting in your browser’s cache? § There are tradeoffs to be made here

Replication Consistency Model § Need to loosen the constraints so that we don’t spend so much on synchronization § Data centric • System-wide consistent view on a data store • I. e. , try to keep data consistent across replicas § Client centric • Weaker condition • Maintain consistency for each client separately • I. e. , If clients are independent, they don’t have to see the same values.

Data-Centric Consistency Models § Strict Consistency Model § Continuous Consistency Model § Sequential Consistency Model § Causal Consistency Model strong § Entry Consistency Model § FIFO Consistency Model § Weak Consistency Model weak

Data-centric Consistency Models § Consistency in terms of read and write operations on shared data, called the data store. ØE. g. ) Shared file system, memory space, DB, etc. ØA consistency model is a contract between processes and the data store that make the data store work correctly.

Continuous Consistency Observation: We can actually talk a about a degree of consistency: § replicas may differ in their numerical value § replicas may differ in their relative staleness § Replicas may differ with respect to (number and order) of performed update operations Conit: consitency unit specifies the data unit over which consistency is to be enforced.

Example Conit: contains the variables x and y: § Each replica maintains a vector clock § B sends A operation [<5, B>: x : = x + 2]; A has made this operation permanent (cannot be rolled back) § A has three pending operations order deviation = 3

Continuous Consistency (2) § Choosing the appropriate granularity for a conit. § If two replicas of coarse-grained conit may differ in no more than one update ØThat is, one update (order of deviation=1) is allowed, but ØTwo updates lead to update propagation.

Continuous Consistency (3) § Fine-grained conit will need NO update propagation. § Unfortunately making conits very small is not a good idea Ødue to management overhead § Also, data items in a conit must not be independent

Strict Consistency § Any read on a data item x returns a value corresponding to the result of the most recent write on x § This is the strictest possible consistency model § It assumes the existence of absolute global time § It is impossible to implement, because of special relativity (on a large distributed system, signals simply can’t travel fast enough)

Sequential Consistency § Clearly, we need to relax our requirements to make our system implementable. § Sequential Consistency (Lamport, 1979): ØThe result of any execution is the same as if the read and write operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program. § This means that all processes see the same interleaving of operations

Sequential Consistency § Behavior of two processes operating on the same data item. The horizontal axis is time. ØThis example doesn’t satisfy strict consistency. ØWe know that it takes some time to propagate updates to remote copies. So this is acceptable in sequential consistency.

Sequential Consistency § (a) A sequentially consistent data store. § (b) A data store that is not sequentially consistent.

Sequential Consistency § Three concurrently-executing processes. ØThere are potentially 90 (6!/23) possible execution sequences.

Sequential Consistency § Four valid execution sequences for the processes Øsignature: outputs in P 1, P 2, and P 3 order. § The contract between the processes and the distributed shared data store is that the processes must accept all of these as valid results.

Causal Consistency § Weaker than sequential consistency ØTwo events are causally related if one is caused by or influenced by the other § Writes that are potentially causally related … Ømust be seen by all processes Øin the same order. § Concurrent writes … Ømay be seen in a different order Øon different machines.

Causal Consistency § This sequence satisfies causal consistency, but not sequential consistency ØWrite operations by two processes P 1 and P 2 are not causally related and they simultaneously write.

Causal Consistency § (a) A violation of a causally-consistent store. ØWrite operations by P 1 and P 2 are causally related. ØSo, other processes should see a first, then b next.

Causal Consistency § (b) A correct sequence of events in a causally-consistent store. ØNow, write operations by P 1 and P 2 are concurrent. ØSo this satisfies causal consistency

Entry Consistency § Sequential and Causal consistency models are defined at the level of read and write. § Coarse-grained consistency models demand ØSynchronization variables (to enter critical section). § A valid event sequence for entry consistency. ØP 2 may read NIL because it hasn’t acquired a lock for y.

FIFO Consistency (PRAM Consistency) § Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes. ØThis satisfies FIFO consistency.

Weak Consistency § Weak consistency (Dubois et al. , 1988) introduces a synchronization variable with one operation, synchronize(S) ØAccesses to synchronization variables associated with a data store are sequentially consistent ØNo operation on a synchronization variable is allowed to be performed until all previous writes have completed everywhere ØNo read or write operations on data items are allowed to be performed until all previous operations to synchronization variables have been performed

Weak Consistency § Sequence (a) satisfies weak consistency, but sequence (b) does not.

Release Consistency § Release consistency is like weak consistency, but there are two operations “lock” and “unlock” for synchronization Ø“acquire/release” are the conventional names Ødoing a “lock” means that writes on other processors to protected variables will be known Ødoing an “unlock” means that writes to protected variables are exported Øand will be seen by other machines when they do a “lock” (lazy release consistency) or immediately (eager release consistency)

Release Consistency § Valid Sequence of Release Consistency

Summary Consistency Description Strict Absolute time ordering of all shared accesses matters. Linearizability All processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time Causal All processes see causally-related shared accesses in the same order. FIFO All processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order (a) Consistency Description Weak Shared data can be counted on to be consistent only after a synchronization is done Release Shared data are made consistent when a critical region is exited Entry Shared data pertaining to a critical region are made consistent when a critical region is entered. (b)

Quiz: a) Consider the following sequence of operations: P 1: W(x)1 W(x)3 P 2: W(x)2 P 3: R(x)3 R(x)2 P 4: R(x)2 R(x)3 Is this execution causally consistent?

Quiz: d) Consider the following sequence of operations: P 1: Acq(L) W(x)1 P 2: W(x)3 Rel(L) Acq(L) Is this execution release consistent? R(x)3 Rel(L) R(x)1