Transaction Management Overview What is a transaction Suppose

  • Slides: 36
Download presentation
Transaction Management Overview

Transaction Management Overview

What is a transaction? Ø Ø Ø Suppose you have embedded SQL code You

What is a transaction? Ø Ø Ø Suppose you have embedded SQL code You can have start transaction, a sequence of SQL statements, then commit transaction. In JDBC, we can write: Ø Connection connection = Driver. Manager. get. Connection(…); connection. set. Auto. Commit(false); …. connection. commit(); or connection. rollback(); Ø Disk content is not changed until commit is executed! Ø Ø Ø

Transactions Ø Ø Ø Concurrent execution of user programs is essential for good DBMS

Transactions Ø Ø Ø Concurrent execution of user programs is essential for good DBMS performance. Ø Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.

ACID properties (desirable) Ø Ø Ø A transaction MUST satisfy the ACID property A:

ACID properties (desirable) Ø Ø Ø A transaction MUST satisfy the ACID property A: atomic: all or nothing Ø the state of the database should be such that either all database operations are executed or non are C: consistent: if the database instance is consistent before the start of the transaction, it should be consistent after the transaction commits. I: isolation: even though other transaction may be executing at the same time, the result of executing the transaction on the database should be as though no other transactions are running. D: durability: if the transaction commits, then the changes should be permanent, even if the system

Concurrency in a DBMS Ø Ø Users submit transactions, and can think of each

Concurrency in a DBMS Ø Ø Users submit transactions, and can think of each transaction as executing by itself. Ø Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. Ø Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. Ø DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements. Ø Beyond this, the DBMS does not really understand the semantics of the data. (e. g. , it does not understand how the interest on a bank account is computed). Issues: Effect of interweaving transactions, and crashes.

Atomicity of Transactions Ø Ø A transaction might commit after completing all its actions,

Atomicity of Transactions Ø Ø A transaction might commit after completing all its actions, or it could abort (or be aborted by the DBMS) after executing some actions. A very important property guaranteed by the DBMS for all transactions is that they are atomic. That is, a user can think of a transaction as always executing all its actions in one step, or not executing any actions at all. Ø DBMS logs all actions so that it can undo the actions of aborted transactions.

Assumptions Ø Ø Transactions interact with each other only through the database (they don't

Assumptions Ø Ø Transactions interact with each other only through the database (they don't send messages to each other) A database is a fixed collection of independent objects.

Overview • Scheduler manages read/write requests from transactions; allows execution in an order that

Overview • Scheduler manages read/write requests from transactions; allows execution in an order that is “serializable, ” i. e. , it “looks like” the transactions executed one at a time (serial)

Scheduler Ø Ø In most cases, the scheduler will execute the read or write

Scheduler Ø Ø In most cases, the scheduler will execute the read or write request immediately. However, in some cases, a read or a write needs to be deferred in time.

Example • Assume A = B is required for consistency. • T 1 and

Example • Assume A = B is required for consistency. • T 1 and T 2 individually preserve DB consistency.

A Serial Schedule S 1 • Assume initially A = B = 25. Here

A Serial Schedule S 1 • Assume initially A = B = 25. Here is one way to execute (S 1= T 1; T 2) so they do not interfere.

Another Serial Schedule S 2 • Here, transactions are executed as (S 2=T 2;

Another Serial Schedule S 2 • Here, transactions are executed as (S 2=T 2; T 1). The result is different, but consistency is maintained.

Interweaving Does Not Necessarily Hurt (S 3)

Interweaving Does Not Necessarily Hurt (S 3)

But Then Again, It Might!

But Then Again, It Might!

The Semantics of transactions is also important. Suppose T 2 multiplies by 1

The Semantics of transactions is also important. Suppose T 2 multiplies by 1

Example with new notation Ø Ø Ø r 1(A); w 1(A); r 2(A); w

Example with new notation Ø Ø Ø r 1(A); w 1(A); r 2(A); w 2(A); r 1(B); w 1(B), r 2(B); w 2(B); is this schedule serializable? Note: We require that the order of actions in each transaction are not changed in a schedule.

Transactions and Schedules • A transaction is a sequence of r and w actions

Transactions and Schedules • A transaction is a sequence of r and w actions on database elements. • A schedule is a sequence of reads/writes actions performed by a collection of transactions. • Serial Schedule = All actions for each transaction are consecutive. • r 1(A); w 1(A); r 1(B); w 1(B); r 2(A); w 2(A); r 2(B); w 2(B); …

Transactions and Schedules (Cont’d) • Serializable Schedule: A schedule whose “effect” is equivalent to

Transactions and Schedules (Cont’d) • Serializable Schedule: A schedule whose “effect” is equivalent to that of some serial schedule. • We will now present a condition that is sufficient to assume that a schedule is serializable (but not necessary).

Conflicts • Suppose for fixed DB elements X & Y, ri(X); rj(Y) is part

Conflicts • Suppose for fixed DB elements X & Y, ri(X); rj(Y) is part of a schedule, and we flip the order of these operations. – ri(X); rj(Y) ≡ rj(Y); ri(X) … In what sense? – This holds always (even when X=Y) • We can flip ri(X); wj(Y), as long as X≠Y • That is, ri(X); wj (X) wj(X); ri (X) – In the RHS, Ti reads the value of X written by Tj, whereas this is not the case on the LHS.

Conflicts (Cont’d) • We can flip wi(X); wj(Y); provided X≠Y • However, wi(X); wj(X)

Conflicts (Cont’d) • We can flip wi(X); wj(Y); provided X≠Y • However, wi(X); wj(X) ≢ wj(X); wi(X); – The final value of X may be different depending on which write occurs last. • There is a conflict if 2 conditions hold. • A read and a write of the same X, or • Two writes of X conflict in general and may not be swapped in order. All other events (reads/writes) may be swapped without changing the effect of the schedule (on the DB).

Rough Idea Ø Ø Ø We want to test if a schedule is conflict-serializable.

Rough Idea Ø Ø Ø We want to test if a schedule is conflict-serializable. That is, is it conflict-equivalent to a serial schedule. We can keep swapping actions that do not conflict until we reach a serial schedule.

Elaborating. . . • If by swapping pairs of non conflicting actions in a

Elaborating. . . • If by swapping pairs of non conflicting actions in a schedule S, we end up in a serial schedule, then S is called a conflict serializable schedule. S: r 1(A); w 1(A); r 2(A); w 2(A); r 1(B); w 1(B); r 2(B); w 2(B); • Conflict serializability is a sufficient condition for serializability but not necessary. That is, conflict serializability is a stronger notion than serializability. •

Why Conflict-Serializable is not Necessary for Serializability Ø Ø Ø Ø Consider the following

Why Conflict-Serializable is not Necessary for Serializability Ø Ø Ø Ø Consider the following serial schedule: S 1 =w 1(Y); w 1(X); w 2(Y); w 2(X); w 3(X); X has the value written by T 3, Y has the value written by T 2 Next, consider the following schedule: S 2=w 1(Y); w 2(X); w 1(X); w 3(X); S 2 is serializable (equivalent to the serial schedule S 1) However, it is not conflict-serializable (w 1(X) and w 2(X) are conflicting)

Precedence Graphs Serializability/precedence graph for a schedule: S Nodes: transactions {T 1, …, Tk}

Precedence Graphs Serializability/precedence graph for a schedule: S Nodes: transactions {T 1, …, Tk} Arcs: There is an arc from Ti to Tj if they have conflicting access to the same database element X and Ti is accessing this element first. Ti < S Tj. Conditions: • Action A 1 is before action A 2 in S • Both actions involve the same element and at least of the actions is a write

 • Is there is a cycle in the graph? – If yes, then

• Is there is a cycle in the graph? – If yes, then there is no serial schedule which is conflict equivalent to S. • Each arc represents a requirement on the order of transactions in a conflict equivalentserial schedule. • If there is no cycle in the graph – Then any topological order of the graph suggests a conflict equivalent schedule.

Why the Precedence Graph Test Works? • Idea: if the precedence graph is acyclic,

Why the Precedence Graph Test Works? • Idea: if the precedence graph is acyclic, then we can swap actions to form a serial schedule consistent with the same partial order; Proof: By induction on n, number of transactions. • Basis: n = 1. That is, S={T 1}; then S is already serial. • Induction: S={T 1, T 2, …, Tn}. Given that SG(S) is acyclic, then pick Ti in S such that no Tj in S is dependent on. – We swap all actions of Ti to the front (of S). – (Actions of Ti)(Actions of the other n 1 transactions) – The tail is a precedence graph that is the same as the original without Ti, i. e. it has n 1 nodes. By the induction hypothesis, we can reorder the actions of the other transactions to turn it into a serial schedule

Schedulers • A scheduler takes requests from transactions for reads and writes and decides

Schedulers • A scheduler takes requests from transactions for reads and writes and decides if it is “OK” to allow them to operate on DB or defer them until it is safe to do so. • Ideal: a scheduler forwards a request iff it cannot lead to inconsistency of DB – Too hard to decide this in real time. • Real: it forwards a request if it cannot result in a violation of conflict serializability. • We thus need to develop schedulers which ensure conflict serializability.

 • The use of lock must be proper in 2 senses: – Consistency

• The use of lock must be proper in 2 senses: – Consistency of Transactions: • Read or write X only when hold a lock on X. – ri(X) or wi(X) must be preceded by some li(X) with no intervening ui(X). • If Ti locks X, then Ti must eventually unlock X. – Every li(X) must be followed by ui(X). – Legality of Schedules: • Two transactions may not have locked the same element X without one having first released the lock. – A schedule with li(X) cannot have another lj(X) until ui(X) appears in between

Legal Schedule DOES NOT Mean Serializable

Legal Schedule DOES NOT Mean Serializable

Two Phase Locking There is a simple condition, which guarantees confict serializability: In every

Two Phase Locking There is a simple condition, which guarantees confict serializability: In every transaction, all lock requests (phase 1) precede all unlock requests (phase 2).

Why 2 PL Works? • Precisely: a legal schedule S of 2 PL transactions

Why 2 PL Works? • Precisely: a legal schedule S of 2 PL transactions is conflict serializable. • Proof is an induction on n, the number of transactions. • Remember: conflicts involve only read/write actions, not locks, but the legality of the transaction requires that the r/w's be consistent with the l/u's.

Why 2 PL Works (Cont’d) • Basis: if n=1, then S={T 1}, and hence

Why 2 PL Works (Cont’d) • Basis: if n=1, then S={T 1}, and hence S is conflict serializable. • Induction: S={T 1, …, Tn}. Find the first transaction, say Ti, to perform an unlock action, say ui(X). • Can we show that the r/w actions of Ti can be moved to the front of the other transactions without conflict? • Consider some action such as wi(Y). Can it be preceded by some conflicting action wj(Y) or rj(Y)? In such a case we cannot swap them. – If so, then uj(Y) and li(Y) must intervene, as wj(Y). . . uj(Y). . . li(Y). . . wi(Y). – Since Ti is the first to unlock, ui(X) appears before uj(Y). – But then li(Y) appears after ui(X), contradicting 2 PL. • Conclusion: wi(Y) can slide forward in the schedule without conflict; similar argument for a ri(Y) action. Therefore, produce: (Actions of Ti)(Actions of the other n 1 transactions)

Strict two-phase locking Ø Ø All locks are released immediately before the transaction commits.

Strict two-phase locking Ø Ø All locks are released immediately before the transaction commits. Advantage: other transactions cannot access uncommitted data Prevents cascading rollbacks Does not guarantee a dead-lock free schedule

Deadlocks Ø Ø Ø When we cannot proceed forward. T 1 holds a lock

Deadlocks Ø Ø Ø When we cannot proceed forward. T 1 holds a lock on X but needs a lock on Y T 2 holds a lock on Y but needs a lock on X neither can proceed forward if 2 PL is used Simple solution: if no progress after 5 seconds, shoot (i. e. , abort) a random transaction Ø problem: cascading aborts if uncommitted data is read

Conclusion Ø Ø Ø You should know what is a transaction What is a

Conclusion Ø Ø Ø You should know what is a transaction What is a schedule, what is a serializable schedule How concurrent transaction execution is supported by the scheduler What is 2 PL, why is it needed How can deadlocks occur