Synchronization Synchronization in centralized systems is easy Synchronization

  • Slides: 51
Download presentation
Synchronization • Synchronization in centralized systems is easy. • Synchronization in distributed systems is

Synchronization • Synchronization in centralized systems is easy. • Synchronization in distributed systems is much more difficult to achieve. • Why do we need synchronization in distributed systems? –Distributed mutual exclusion –Distributed Concurrency and Deadlock –Leader/Coordinator election • Basic Issues examined here: –Clock synchronization –Logical clocks –Global State Algorithms –Distributed transactions 1

Clock Synchronization • Example: using makefile to develop a program. . • Different machines

Clock Synchronization • Example: using makefile to develop a program. . • Different machines are used for creation/compilation When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. 2

Physical Clocks 1. 2. Basic mechanism: Timer A computer timer is often an oscillating

Physical Clocks 1. 2. Basic mechanism: Timer A computer timer is often an oscillating quartz crystal at well defined frequencies With the crystal there are two registers: counter, holding register. Each oscillation of the crystal decreases the counter by one. When the counter is ZERO 3. 4. 5. • • 6. 7. an interrupt is sent to the CPU The counter is loaded the value of the holding counter In this way, the crystal can create an interrupt 60 times a second. Each such interrupt is a clock tick (and constitutes the basic timing mechanism is a centralized system). 3

Multiple Physical Clocks • If many CPUs are introduced time skew may develop! •

Multiple Physical Clocks • If many CPUs are introduced time skew may develop! • Two fundamental problems need to be addressed: – How do we synchronize clocks with real-time clocks – How do we synchronize clocks with each other. 4

Physical Clock • Transit of the sun – solar day definition – solar second

Physical Clock • Transit of the sun – solar day definition – solar second (1/864000) • The earth’s rotation is not constant! Some days are longer/shorter than others – This lead to the introduction of the mean solar second. • TAI seconds are produced by cesium-133 -atom clocks. Computation of the mean solar day. 5

TAI Clocks & Leap Seconds • TAI seconds are of constant length, unlike solar

TAI Clocks & Leap Seconds • TAI seconds are of constant length, unlike solar seconds. • Leap seconds are introduced when necessary to keep in phase with the sun. • The introduced correction (based on TAI seconds and stays in sync with the sun’s rotation) is called Universal Coordinated Time (UTC) 6

UTC Services Short wave radio stations broadcast a short pulse at the start of

UTC Services Short wave radio stations broadcast a short pulse at the start of each UTC second. • MSF Station (UK) • NIST (US) • Geo-stationary Environment Operational Satellite (accurate to 0. 5 msec) 7

Clock Synchronization Algorithms In an ideal world, Cp(t) = t where Cp(t) value clock

Clock Synchronization Algorithms In an ideal world, Cp(t) = t where Cp(t) value clock on the machine p and t is the UTC time • The relation between clock time and UTC when clocks tick at different rates. • Maximum drift rate 1 -p <= d. C/dt <= 1+p • If two clocks are drifting from UTC they can be as far apart as 2 p at any given time Dt. 8

Cristian's Clock Synch Algorithm • Requirement: if two clocks differ more than d must

Cristian's Clock Synch Algorithm • Requirement: if two clocks differ more than d must be resynchronized (in software) • This must happen at least every d/2 r seconds. . • Cristian’s Algorithm: send the current time from a server. Getting the current time from a time server. 9

Christian’s Algorithm Problems Two problems: • If the sender’s clock runs faster, the UTC

Christian’s Algorithm Problems Two problems: • If the sender’s clock runs faster, the UTC time provided will be “earlier” – this could lead to inconsistencies (recompilation of source files etc). – Such a change should be introduced gradually – Slow down the timer of the CPU. . • How to estimate the delays for shipping messages. – (T 1 -T 2)/2 – If you can estimate the time it takes the time server to handle the interrupt and process the incoming message I – (T 1 -T 2 -I)/2 10

The Berkeley Algorithm a) b) c) The time daemon asks all the other machines

The Berkeley Algorithm a) b) c) The time daemon asks all the other machines for their clock values The machines answer The time daemon tells everyone how to adjust their clock 11

Averaging Distributed Algorithms • One class of algorithms works by dividing time into fixed-length

Averaging Distributed Algorithms • One class of algorithms works by dividing time into fixed-length resynchronization intervals. • The I-th interval starts at T 0+ i. R and runs until T 0+(i+1)*R – T 0 is an agreed upon moment in the past and – R is a system parameter. • At the beginning of each interval every machine broadcasts its current time. • After this broadcast, each machine starts a local timer to collect all other broadcasts with time S. • When time S elapsed the average (in each machine) is computed. • A slight variation: m lowest and n highest values (from the set collected in S period) are discarded. Why? • Examples of such protocol: NTP (Network Time Protocol). 12

Logical Clocks • In a network, it is important that all machines agree upon

Logical Clocks • In a network, it is important that all machines agree upon a time • This time does not need to be in sync with the time broadcasted by the radio (all the time). • In the make example, even if machines agree that it is 17: 00 it does not really matter whether the UTC is 17: 00: 02. . • Notion of logical clock (17: 00). 13

Logical clocks • Lamport defined the relation “happened-before” • a-> b (event a happened

Logical clocks • Lamport defined the relation “happened-before” • a-> b (event a happened before event b) • The happened before relation can be observed in two settings: – If a and b are events in the same process, and a occurs before b, then a->b holds. – If a is the event of a message sent by one process, and b is the event of the message being received by another process b, then a->b holds (ie, a message cannot be received unless it has been sent). . 14

Logical Clocks • Happened before is transitive • If x and y happen in

Logical Clocks • Happened before is transitive • If x and y happen in different processes that do not exchange messages then neither x->y nor y->x is true • The time for an event a is C(a) • If a->b then C(a) < C(b) • Logical times go always forward (corrections can be made by additions – never subtractions!) 15

Lamport’s Algorithm 0 0 0 8 10 12 18 24 30 16 20 40

Lamport’s Algorithm 0 0 0 8 10 12 18 24 30 16 20 40 50 36 42 48 48 60 6 54 60 60 A 24 B 30 32 56 64 D 40 C 70 80 72 90 80 100 Three processes each with its own clock-Clocks run at different frequencies. 16

Lamport’s Algorithm-Solution 0 6 12 18 24 30 36 42 48 70 76 60

Lamport’s Algorithm-Solution 0 6 12 18 24 30 36 42 48 70 76 60 A 0 0 8 10 16 20 24 30 32 40 4 0 48 50 61 D B 60 C 70 69 80 7 7 85 90 100 • Lamport’s algorithm corrects the clocks and provideds a way for total ordering of events • If a happens before b in the same process C(a)<C(b) • If a and b represent the sending and receing of a message respectively the C(a) < C(b) • For all distinctive events a and b C(a) != C(b) 17

Lamport Timestamps • Queries run faster when work off replicas of data • Two

Lamport Timestamps • Queries run faster when work off replicas of data • Two users (customer in San Fran and admin in NYC) 1. the customer from San Fran adds $100. 00 to her account (at $1000 now) 2. the admin (from NYC) gives an increase of 1% to all accounts. There is obviously a problem here. . 18

Problem with Replicated Data Problem: Updating a replicated database may leave it in an

Problem with Replicated Data Problem: Updating a replicated database may leave it in an inconsistent state. The two copies should be exactly the same!! (no matter what the order of the operations – the order does not say much about the consistency of the data; simply says that one order, or the other, should be followed). This situation calls for a totally-orderd multicast (of operations). How can this be done? ? Can we use Lamport’s algorithm? 19

Sketch of the Solution • Group of processes multicasting messages to each other •

Sketch of the Solution • Group of processes multicasting messages to each other • Each message is always time-stamped with the time of the sender • Assume that messages from the same sender are received in the order they were sent and no messages are lost. • When a process receives a msg, put it into the local queue and the receiver multicasts a ACK to the other processes. • All processes will have the same copy (ordered) in their local queue! • Lamport’s clocks ensure that NO two messages have the same timestamps! 20

Global State • Global State = local states of the processes + message currently

Global State • Global State = local states of the processes + message currently in transit. • Why knowing the Global State is useful? – If local processes have stopped and no more msgs are in transit, then we have developed a stale situation where nonone can progress (ie, something needs to be done). • Take a “distributed snapshot” – Reflects a consistent global state. – If a message has been received then it must have been sent from somewhere before! (otherwise something is wrong). – A global state can be represented by what is known as the cut. – Cuts can be consistent or inconsistent. 21

Cuts-Snapshots of Global State a) b) A consistent cut: one that does not include

Cuts-Snapshots of Global State a) b) A consistent cut: one that does not include received but not sent messages! An inconsistent cut What we want to define here is an algorithm that provides an consistent Cut (snapshot) of the distributed system. 22

An Algorithm for Deriving a Distributed Snapshot • Assumptions: each process (in the DS)

An Algorithm for Deriving a Distributed Snapshot • Assumptions: each process (in the DS) is connected to each other via unidirectional point-2 -point comm. channels (TCP connections) • Any process may initiate the algorithm • The initiating process starts by recording its local state and then sends a MARKER along each outgoing channel (indicating that the receiver should participate in the recording of the global state). 23

Global State Algorithm • When a process Q receives a marker through its incoming

Global State Algorithm • When a process Q receives a marker through its incoming channel C – If it has not record its own local state, it does so and sends Markers along its outgoing channels. – Otherwise, the marker that appeared on incoming channel signals that the state of the channel must be recorded (this is done by forming the sequence of messages received by Q since the last time Q recorded its state and before it received the marker). • A process has finished when it has received a marker along each of its incoming channels and processed all of them. • At that point, local state and messages in transit can be sent to a coordinator that assembles the global state. 24

Global State a) Organization of a process and channels for a distributed snapshot 25

Global State a) Organization of a process and channels for a distributed snapshot 25

Global State b) c) d) Process Q receives a marker for the first time

Global State b) c) d) Process Q receives a marker for the first time and records its local state Q records all incoming message Q receives a marker for its incoming channel and finishes recording the state of the incoming channel 26

Distributed Computation Termination Algorithm • When a process finishes its part of the snapshot

Distributed Computation Termination Algorithm • When a process finishes its part of the snapshot returns either a DONE or a CONTINUE message to its predecessor. • A DONE message is returned (both conds must be true) – All of Q’s successors have returned DONE messages. – Q has not received any message(s) between the point it recorded its state, and the point it had received the marker along each of its incoming channels. • In all other cases, a CONTINUE messages is sent to Q’s predecessor. • If the original initiator P receives only DONE from its successors – It means there are NO messages in transit – Therefore, computation is complete. 27

Election Algorithms • Many distributed applications require that one site undertakes the role of

Election Algorithms • Many distributed applications require that one site undertakes the role of the coordinator or master • Problem: how to come up with such a master? • Each process has a unique id – Network address + id in the local space. 28

The Bully Algorithm The process with the higher ID (or attribute) takes over. .

The Bully Algorithm The process with the higher ID (or attribute) takes over. . 7 was the coordinator and has just crashed. . The bully election algorithm • Process 4 holds an election • Process 5 and 6 respond, telling 4 to stop • Now 5 and 6 each hold an election 29

Bully Algorithm d) e) Process 6 tells 5 to stop Process 6 wins and

Bully Algorithm d) e) Process 6 tells 5 to stop Process 6 wins and tells everyone If 7 wakes-up it can hold an election and “bully” all others (takes over). 30

A Ring Algorithm Assumption: processes are physically or logically ordered Two phases: • start

A Ring Algorithm Assumption: processes are physically or logically ordered Two phases: • start an ELECTION (this can be doen by more than one sites) • Once the circle is done determine the COORDINATOR (largest? ) • Circulate the name of the coordinator (ie, inform everyone) Election algorithm using a ring. 31

Mutual Exclusion: A Centralized Algorithm a) Process 1 asks the coordinator for permission to

Mutual Exclusion: A Centralized Algorithm a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2 32

A Distributed Algorithm[Ricart. Agra 81] a) b) c) Two processes want to enter the

A Distributed Algorithm[Ricart. Agra 81] a) b) c) Two processes want to enter the same critical region at the same moment. Process 0 has the lowest timestamp, so it wins. When process 0 is done, it sends an OK also, so 2 can now enter the critical region. 33

A Toke Ring Algorithm Circulate a token – whoever has the token can get

A Toke Ring Algorithm Circulate a token – whoever has the token can get into its critical section a) An unordered group of processes on a network. b) A logical ring constructed in software. 34

Comparison Algorithm Messages per entry/exit Delay before entry (in message times) Centralized 3 2

Comparison Algorithm Messages per entry/exit Delay before entry (in message times) Centralized 3 2 Distributed 2(n– 1) Crash of any process Token ring 1 to 0 to n – 1 Lost token, process crash Problems Coordinator crash • A comparison of three mutual exclusion algorithms. • The infinity indicates that the token may be aimlessly circulated in a network (if no-one wants to make use of it). 35

The Transaction Model • Being able to group a number of statements together in

The Transaction Model • Being able to group a number of statements together in an entity that its being executed ONLY in its logical entirety. • A transaction may be concurrently executing with others in the same (or distributed) system. • Examples of transactions (xactions) – Get Euro 100. 00 from your own account – Deposit Euro 25. 00 in account with number 356533 – Increase all accounts by 2. 7% of their balances. • The concept of transaction is supported by a few fundamental constructs. 36

The Transaction Model Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate

The Transaction Model Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Programming primitives for transactions. 37

The Transaction Model BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi

The Transaction Model BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi; END_TRANSACTION (a) a) b) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full => ABORT_TRANSACTION (b) Transaction to reserve three flights commits Transaction aborts when third flight is unavailable 38

Xactions Properties ACID- properties (or known as ACIDity). A: atomicity C: consistency I :

Xactions Properties ACID- properties (or known as ACIDity). A: atomicity C: consistency I : isolation D: durability 39

Distributed Transactions a) A nested transaction: for each one a fork is used by

Distributed Transactions a) A nested transaction: for each one a fork is used by the parent transaction • What happens in case of failure? b) A distributed transaction • Separate distributed algorithms are needed to handle management (locking) of data and commitment of the whole transaction. 40

Implementation of Transactions using Shadows (shadow blocks) a) b) c) The file index and

Implementation of Transactions using Shadows (shadow blocks) a) b) c) The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 After committing 41

Write Ahead Log (WAL) x = 0; y = 0; BEGIN_TRANSACTION; x = x

Write Ahead Log (WAL) x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y=y+2 x = y * y; END_TRANSACTION; (a) Log Log [x = 0 / 1] [y = 0/2] [x = 1/4] (b) (c) (d) a) A transaction b) – d) The log before each statement is executed • If a xaction succeeds, it commits (point of no return) • Otherwise, the WAL is used to rollback to a consistent database state. 42

Concurrency Control General organization of managers for handling transactions. 43

Concurrency Control General organization of managers for handling transactions. 43

Concurrency Control General organization of managers for handling distributed transactions. 44

Concurrency Control General organization of managers for handling distributed transactions. 44

Principle of Serializability BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION (a)

Principle of Serializability BEGIN_TRANSACTION x = 0; x = x + 1; END_TRANSACTION (a) BEGIN_TRANSACTION x = 0; x = x + 2; END_TRANSACTION BEGIN_TRANSACTION x = 0; x = x + 3; END_TRANSACTION (b) (c) Time Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal Schedule 2 x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal Schedule 3 x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal (d) a) – c) Three transactions T 1, T 2, and T 3 d) Possible schedules 45

Two-Phase Locking Two-phase locking. 46

Two-Phase Locking Two-phase locking. 46

Strict Two-Phase Locking • A transaction always reads committed values • Avoids cascading aborts

Strict Two-Phase Locking • A transaction always reads committed values • Avoids cascading aborts • Distributed 2 PL: – Schedulers on each machine take care of the locks (grant/release); – Operations are forwarded to local managers. 47

Time Stamp Ordering • Each database item has a TSR(x) and a TSW(x) •

Time Stamp Ordering • Each database item has a TSR(x) and a TSW(x) • TSR(x) is set by the xaction that most recently read the item x • TSW(x) is set by the xaction that most recently changed the value of x. Timestamp Algorithm • Suppose that xaction Ti with TS(Ti) issues read(x) – If TS(Ti) < TSw(x) then Read needs to read a value of x that was already written by another subsequent xaction; read is rejected and Ti is rolled back. – If TS(Ti) >= TSw(x) then read is executed and TSR(x)=max{TSR(x), TS(Ti)} • Suppose that xaction Ti with TS(Ti) issues write(x) – If TS(Ti) < TSR(x) then the xaction is rejected and Ti is rolled back. – If TS(Ti) < TSw(x) then the write is rejected and Ti is rolled back. – Otherwise, the write operation is executed, and TSw(x)=TS(Ti). 48

Timestamp Ordering Example T 1 T 2 A 150 160 RT=0; WT=0 read(A) RT=150

Timestamp Ordering Example T 1 T 2 A 150 160 RT=0; WT=0 read(A) RT=150 read(A) RT=160 A: =A+1 write(A) WT=160 write(A) T 1 aborts!! 49

Timestamp Ordering T 1 200 T 2 150 T 3 175 A RT=0 WT=0

Timestamp Ordering T 1 200 T 2 150 T 3 175 A RT=0 WT=0 read(B) read(A) B RT=0 WT=0 RT=200 C RT=0 WT=0 RT=150 read(C) write(B) write(A) RT=175 WT=200 write(C) ABORT T 2 write(A) 50

Optimistic Concurrency Control • Idea: let everything go ahead and then before the transaction

Optimistic Concurrency Control • Idea: let everything go ahead and then before the transaction commits check to see whether anyone else is affected. Structure of a Transaction R/W validate commit • When a transaction fails the test, it has to be rolled back. 51