Concurrency Control and Recovery In real life users

  • Slides: 31
Download presentation
Concurrency Control and Recovery • In real life: • users access the database concurrently,

Concurrency Control and Recovery • In real life: • users access the database concurrently, and • systems crash. • Concurrent access to the database also improves performance, yields better utilization of resources. • BUT: if not careful, concurrent access can lead to incorrect database states. Crashes can also leave the database in incoherent states. • Basic concurrency/recovery concept: transaction • executed atomically. All or nothing. • We cover: • transactions in SQL • implementation of transactions and recovery.

Flight Reservation get values for : flight, : date, : seat EXEC SQL SELECT

Flight Reservation get values for : flight, : date, : seat EXEC SQL SELECT occupied INTO : occ FROM Flight WHERE flt. Num = : flight AND fltdt= : date AND flt. Seat=: seat if (!occ) { EXEC SQL UPDATE Flights SET occupied = ‘true’ WHERE flt. Num= : flight AND fltdt= : date AND flt. Seat=: seat /* more code missing */ } else /* notify customer that seat is not available */

Problem #1 Customer 1 - finds a seat empty Customer 2 - finds the

Problem #1 Customer 1 - finds a seat empty Customer 2 - finds the same seat empty Customer 1 - reserves the seat. Customer 2 - reserves the seat. Customer 1 will not be happy. serializability

Bank Transfers Transfer : amount from : account 1 to : account 2 EXEC

Bank Transfers Transfer : amount from : account 1 to : account 2 EXEC SQL SELECT balance INTO : balance 1 FROM Accounts WHERE acc. No = : account 1 if (balance 1 >= amount) EXEC SQL UPDATE Accounts SET balance = balance + : amount WHERE acct. No = : account 2; EXEC SQL UPDATE Accounts SET balance = balance - : amount WHERE acct. No = : account 1; Crash. . .

Transactions The user/programmer can group a sequence of commands so that they are executed

Transactions The user/programmer can group a sequence of commands so that they are executed atomically and in a serializable fashion: • Transaction commit: all the operations should be done and recorded. • Transaction abort: none of the operations should be done. In SQL: • EXEC SQL COMMIT; • EXEC SQL ROLLBACK; Easier said than done. . .

ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if

ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database.

How Do We Assure ACID? Concurrency control: Guarantees consistency and isolation, given atomicity. Logging

How Do We Assure ACID? Concurrency control: Guarantees consistency and isolation, given atomicity. Logging and Recovery: Guarantees atomicity and durability. If you are going to be in the logging business, one of the things that you’ll have to do is learn about heavy equipment. -- Robert Van. Natta Logging History of Columbia County

More on SQL and Transactions Read only transactions: • if the transaction is only

More on SQL and Transactions Read only transactions: • if the transaction is only reading, we can allow more operations in parallel. EXEC SQL SET TRANSACTION READ ONLY; The default is: SET TRANSACTION READ WRITE;

Dirty Data that has been written by a transaction that has not committed yet

Dirty Data that has been written by a transaction that has not committed yet is called dirty data. Do we allow our transaction to read dirty data? It may go away… In SQL: SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED Note: default for READ UNCOMMITTED transactions is that they are READ ONLY.

Problems with Dirty Data Transfer program: 1. Transfer $N from account 1 to account

Problems with Dirty Data Transfer program: 1. Transfer $N from account 1 to account 2 2 a. If account 1 has enough for the transfer, 2 b. subtract $N from account 1, and commit. 2 c. Subtract $N from account 2, and commit. Bad scenario: A 1: $100, A 2: $200, A 3: $300 T 1: transfer $150 from A 1 to A 2 T 2: transfer $250 from A 2 to A 3. Events: • T 2 does step 1, -> A 3 has $550 • T 1 does step 1, -> A 2 has $350 • T 2 does step 2 a, all is ok. • T 1 does step 2 a and finds that A 1 doesn’t have enough funds. • T 2 does step 2 b, -> A 2 now has $100 • T 1 does step 2 c, -> A 2 now has -$50.

Concurrency Control Methods • Schedules • Serial schedules • Serializable schedules • Locking •

Concurrency Control Methods • Schedules • Serial schedules • Serializable schedules • Locking • Lock manager • 2 Phase Locking • Deadlocks: • Prevention • Detection

Schedules • A schedule is an interleaving of a set of actions of different

Schedules • A schedule is an interleaving of a set of actions of different transactions, such that the actions of any single transaction are in order. • A schedule represents some actual sequence of database actions. • In a complete schedule, every transaction either commits or aborts. • Initial state + Schedule -> Final state.

Acceptable Schedules Serial schedules: • The transactions run one at a time from beginning

Acceptable Schedules Serial schedules: • The transactions run one at a time from beginning to completion. • Note: there are many possible serial schedules. Each one is OK. The DBMS does not provide any guarantee in which order concurrently submitted transactions are executed. Serializable schedules: • Final state is what some serial schedule would have produced.

Aborted Transactions Slight modification to the definition: A schedule is serializable if it is

Aborted Transactions Slight modification to the definition: A schedule is serializable if it is equivalent to a serial schedule of committed transactions. • As if the aborted transactions never happened. Two issues to consider w. r. t. aborted transactions: • how does one undo the effect of a transaction? • What if another transaction sees the effects of an aborted one?

Locks Concurrency control is usually done via locking. The lock manager maintains a list

Locks Concurrency control is usually done via locking. The lock manager maintains a list of entries: • object identifier (can be page, record, etc. ) • number of objects holding lock on the object • nature of the lock (shared or exclusive) • pointer to a list of lock requests. -Lock compatibility table: -- Ö If a transaction cannot get a lock, it is S Ö suspended on a wait queue. X Ö S X Ö Ö Ö

Handling Lock Requests Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait

Handling Lock Requests Lock Request (XID, OID, Mode) Mode==S Mode==X Currently Locked? Empty Wait Queue? Yes No Put on Queue Grant Lock No

Two-Phase Locking (2 PL) • 2 phase locking: • if T wants to read

Two-Phase Locking (2 PL) • 2 phase locking: • if T wants to read an object, it first obtains an S lock. • If T wants to write an object, it first obtains an X lock. • If T releases any lock, it can acquire no new locks. • Recall: all this is done transparently to the user by the DBMS. • 2 PL guarantees serializability! • Why? ? lock point # of lock s growing phase shrinking phase Time

Serializability Graphs Two actions conflict if they access the same data item. The precedence

Serializability Graphs Two actions conflict if they access the same data item. The precedence graph contains: • A node for every committed transaction • An arc from Ti to Tj if an action of Ti precedes and conflicts with an action of Tj. • T 1 transfers $100 from A to B, T 2 adds 6% to both R 1(A), W 1(A), R 2(A), W 2(A), R 2(B), W 2(B), R 1(B), W 1(B) T 1 T 2

Conflict Serializability • 2 schedules are conflict equivalent if: – they have the same

Conflict Serializability • 2 schedules are conflict equivalent if: – they have the same sets of actions, and – each pair of conflicting actions is ordered in the same way. • A schedule is conflict serializable if it is conflict equivalent to a serial schedule. – Note: Some serializable schedules are not conflict serializable! • Theorem: A schedule is conflict serializable iff its precedence graph is acyclic. • Theorem: 2 PL ensures that the precedence graph will be acyclic!

Deadlocks Suppose we have the following scenario: • T 1 asks for an exclusive

Deadlocks Suppose we have the following scenario: • T 1 asks for an exclusive lock on A • T 2 asks for an exclusive lock on B • T 1 asks for a shared lock on B • T 2 asks for a shared lock on A Both T 1 and T 2 are waiting! We have a DEADLOCK. Possible solutions: • Prevent deadlocks to start with, or • Detect when they happen and do something about it.

Deadlock Prevention • Give each transaction a timestamp. “Older” transactions have higher priority. •

Deadlock Prevention • Give each transaction a timestamp. “Older” transactions have higher priority. • Assume Ti requests a lock, but Tj holds a conflicting lock. We can follow two strategies: • Wait-die: if Ti has higher priority, it waits; else Ti aborts. • Wound-wait: if Ti has higher priority, abort Tj; else Ti waits. • Note: after aborting, restart with original timestamp! Both strategies guarantee deadlock-free behavior!

An Alternative to Prevention • In theory, deadlock can involve many transactions: T 1

An Alternative to Prevention • In theory, deadlock can involve many transactions: T 1 waits-for T 2 waits-for T 3. . . waits-for T 1 • In practice, most “deadlock cycles” involve only 2 transactions. • Don’t need to prevent deadlock! What’s the problem with prevention? • Allow it to happen, then notice it and fix it. Deadlock detection.

Deadlock Detection • Lock Manager maintains a “Waits-for” graph: • Node for each transaction.

Deadlock Detection • Lock Manager maintains a “Waits-for” graph: • Node for each transaction. • Arc from Ti to Tj if Tj holds a lock and Ti is waiting for it. • Periodically check graph for cycles. • “Shoot” some transaction to break the cycle. • Simpler hack: time-outs. T 1 made no progress for a while? Shoot it.

Detection Versus Prevention • Prevention might abort too many transactions. • Detection might allow

Detection Versus Prevention • Prevention might abort too many transactions. • Detection might allow deadlocks to tie up resources for a while. – Can detect more often, but it’s time-consuming. • The usual answer: – Detection is the winner. – Deadlocks are pretty rare. – If you get a lot of deadlocks, reconsider your schema/workload!

Review: ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency:

Review: ACID Properties Atomicity: all actions of a transaction happen, or none happen. Consistency: if a transaction is consistent, and the database starts from a consistent state, then it will end in a consistent state. Isolation: the execution of one transaction is isolated from other transactions. Durability: if a transaction commits, its effects persist in the database. The Recovery Manager guarantees Atomicity & Durability.

Motivation for Recovery • Atomicity: – Transactions may abort (“Rollback”). • Durability: – What

Motivation for Recovery • Atomicity: – Transactions may abort (“Rollback”). • Durability: – What if DBMS stops running? (Causes? ) v Desired Behavior after system restarts: – T 1, T 2 & T 3 should be durable. – T 4 & T 5 should be aborted (effects not seen). T 1 T 2 T 3 T 4 T 5 crash!

Handling the Buffer Pool • Force every write to disk? – Poor response time.

Handling the Buffer Pool • Force every write to disk? – Poor response time. – But provides durability. No Steal Force • Steal buffer-pool frames from uncommited No Force transactions? – If not, poor throughput. – If so, how can we ensure atomicity? Steal Trivial Desired

Basic Idea: Logging • Record REDO and UNDO information, for every update, in a

Basic Idea: Logging • Record REDO and UNDO information, for every update, in a log. – Sequential writes to log (put it on a separate disk). – Minimal info (difference) written to log, so multiple updates fit in a single log page. • Log: An ordered list of REDO/UNDO actions – Log record contains: <XID, page. ID, offset, length, old data, new data> – and additional control information. • The Write-Ahead Logging Protocol: Must force the log record for an update before the corresponding data page gets to disk. ‚ Must write all log records for a transaction before commit.

WAL & the Log DB LSNs page. LSNs RAM flushed. LSN • Each log

WAL & the Log DB LSNs page. LSNs RAM flushed. LSN • Each log record has a unique Log Sequence Number (LSN). – LSNs always increasing. Log records • Each data page contains a page. LSN. flushed to disk – The LSN of the most recent log record for an update to that page. • System keeps track of flushed. LSN. – The max LSN flushed so far. page. LSN • WAL: Before a page is written, “Log tail” in RAM – page. LSN £ flushed. LSN

Recovery Three steps: (a la` ARIES) Starting from a checkpoint: • Analysis: figure out

Recovery Three steps: (a la` ARIES) Starting from a checkpoint: • Analysis: figure out which transactions committed since the checkpoint, and which failed. • REDO all actions in the log. • UNDO effects of failed transactions.

Summary • Users access the database concurrently, and sometimes there are crashes. • Transactions

Summary • Users access the database concurrently, and sometimes there are crashes. • Transactions are sets of operations that are guaranteed to be atomic. • The DBMS guarantees: Atomicity, Consistency, Isolation, Durability. • Isolation and consistency are guaranteed via locking: 2 -phase (need special care for deadlocks). • Atomicity and durability are guaranteed by: • Logging • Recovery manager (that uses the log). There are MANY more missing details!