Advanced Database Topics Copyright Ellis Cohen 2002 2005

Advanced Database Topics Copyright © Ellis Cohen 2002 -2005 Transactions, Failure & Recovery These slides are licensed under a Creative Commons Attribution-Non. Commercial-Share. Alike 2. 5 License. For more information on how you may use them, please see http: //www. openlineconsult. com/db © Ellis Cohen 2002 -2005

Topics Transactions & Commit Abort & Rollback Nested Transactions & Savepoints Transactions, Failure & Recovery Server Page Caching Ensuring Atomicity & Durability with Shadow Paging Ensuring Atomicity & Durability with Undo Logging Redo Logging Undo/Redo Logging Ensuring Longer-Term Durability Handling Consistency Failure © Ellis Cohen 2002 -2005 2

ACID Properties of Transactions Atomicity * All of the updates of a transaction are done or none are done Consistency * Each transaction leaves the database in a consistent state (preferably via consistency predicates) Isolation Each transaction, when executed concurrently with other transactions, should have the same effect as if executed by itself Durability * Once a transaction has successfully committed, its changes to the database should be permanent © Ellis Cohen 2002 -2005 3

Transactions and Commit © Ellis Cohen 2002 -2005 4

Transaction Logical unit of work that must be either entirely carried out or aborted Example: a sequence of SQL commands, grouped together, e. g. in an SQL*Plus script If only part of the transaction were carried out, the database could be left in an inconsistent state © Ellis Cohen 2002 -2005 5

Example SQL*Plus Script This script moves money from one account to another. Parameters: &srcacct - The account to move money from &dstacct - The account to move money to &amt - The amount of money to be moved UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; © Ellis Cohen 2002 -2005 Suppose a crash occurs right here! 6

Transactions & COMMIT All SQL commands are performed within a transaction Transaction starts Modify Transaction ensures these are done atomically • Each modification is visible to the SQL commands executed after it in the same transaction • But the modification is not actually persisted to the database until the transaction commits Modify • So, if a crash occurs in the middle of a transaction, after some modifications have been done, Modify Transaction commits Modifications persisted to DB the DB acts as if the modifications never happened! © Ellis Cohen 2002 -2005 7

Uncommitted & Committed Transactions start transaction modify Modifications not persisted start transaction modify COMMIT Modifications persisted to DB © Ellis Cohen 2002 -2005 8

SQL*Plus Commit Example SQL> set autocommit off Transaction started automatically at first update if not already in progress SQL> UPDATE checking SET balance = balance - &amt WHERE acctid = &srcacct; SQL> UPDATE checking SET balance = balance + &amt WHERE acctid = &dstacct; SQL> COMMIT; © Ellis Cohen 2002 -2005 9

Starting Transactions The COMMIT command ends a transaction How do transactions start? • Most databases start a new transaction automatically – on the first access to the DB within a session, and – on the first access following a COMMIT • Some databases have a START TRANSACTION command (to support complex nested transactions) © Ellis Cohen 2002 -2005 10

Transactions & DB Requests Cross. Request Transactions Within. Request Transactions UPDATE … COMMIT Execute Stored Procedure Middle Tier © Ellis Cohen 2002 -2005 PROCEDURE Stored. Proc IS BEGIN UPDATE … COMMIT END; Data Tier 11

Automatic Commit Updates may persist even when COMMIT is not explicitly called • Most databases support — either on the server or just through the client-side API — an autocommit mode which automatically does a commit after execution of each request made to the database. This is often the default. • Most databases automatically COMMIT when a client cleanly closes their connection to the database. • Most databases (including Oracle) do not allow DDL statements (e. g. CREATE TABLE) to be part of a larger transaction, and automatically do a commit before and after executing a DDL statement. © Ellis Cohen 2002 -2005 12

Java Commit Example Connection conn = …; conn. set. Auto. Commit( false ); movemoney( conn, 30479, 61925, 2000 ); … //-------------------static void movemoney( Connection conn, int srcacct, int dstacct, float amt ) { Statement stmt = conn. create. Statement(); String sqlstr = "update checking" + " set balance = balance - " + amt + " where acctid = " + srcacct; stmt. execute. Update( sqlstr ); sqlstr = "update checking" + " set balance = balance + " + amt + " where acctid = " + dstacct; stmt. execute. Update( sqlstr ); } conn. commit(); © Ellis Cohen 2002 -2005 13

Abort & Rollback © Ellis Cohen 2002 -2005 14

Aborting a transaction undoes the effects of the transaction -- it is as if the transaction never started Transactions are aborted in 3 ways: 1. The system crashes: All active transactions are aborted 2. An uncorrectable error occurs while executing the transaction 3. The transaction explicitly aborts (this is called a ROLLBACK) A transaction completes when it either commits or aborts © Ellis Cohen 2002 -2005 15

Rollback aborts a transaction • SQL*Plus ROLLBACK • Java conn. rollback() © Ellis Cohen 2002 -2005 16

Commit vs Rollback start transaction modify COMMIT modify ROLLBACK start transaction modify © Ellis Cohen 2002 -2005 17

Explicit Rollback With AUTOCOMMIT OFF SQL> COMMIT; SQL> UPDATE Emps SET job = 'COOK'; SQL> UPDATE Emps SET sal = sal + 200; SQL> ROLLBACK; After the ROLLBACK, the state is exactly as it was following the COMMIT. It is as if the two UPDATEs never happened! © Ellis Cohen 2002 -2005 18

Explicit Rollback in Java { } Connection conn = …; conn. set. Auto. Commit( false ); Statement stmt = conn. create. Statement(); String sqlstr = …; stmt. execute. Update( sqlstr ); … if (…) conn. commit(); else conn. rollback(); © Ellis Cohen 2002 -2005 19

Rollback Past Commit Rollback rolls the state back to the beginning of the transaction. Why not allow some form of rollback that goes back to some earlier point? © Ellis Cohen 2002 -2005 20

Commit Semantics & Compensating Transactions Because commits are durable! When a transaction commits, the user or application is notified that the commit succeeded and can't be undone, and may take other actions outside the databased on that – Display output to a user – Send a message to another process – Launch nuclear missile Some systems allow a compensating transaction to be associated with a transaction when it commits. The compensating transaction can be executed to "undo" the effects of the associated committed transaction (possibly within some time limit) – Output a retraction – Send a compensating message – Destroy the nuclear missile © Ellis Cohen 2002 -2005 21

Nested Transactions & Savepoints © Ellis Cohen 2002 -2005 22

Nested Transactions Transaction can nest start transaction modify Only the outermost transaction can commit and persist data Nested transaction can control the degree of rollback Nested transactions in SQL are implemented using SAVEPOINTs © Ellis Cohen 2002 -2005 23

Savepoints SAVEPOINT <name> Explicitly start new named nested transaction ROLLBACK to SAVEPOINT <name> Rolls back to state at start of named nested transaction RELEASE SAVEPOINT <name> Releases savepoint and associated transaction [not supported by Oracle] (Setting a savepoint with the same name as an existing savepoint releases the existing one) COMMIT Releases all savepoints within outermost transaction & commits start transaction set savepoint a b set savepoint c rollback to b © Ellis Cohen 2002 -2005 commit 24

Using Savepoints Set savepoint to try something that is quick but doesn’t always work e. g. access to some remote database that is not always available On failure, back up to the savepoint (undoing any changes to the DB you have made) and try slower but more reliable technique © Ellis Cohen 2002 -2005 25

Alternative Path in PL/SQL BEGIN Do. Useful. Setup( … ); BEGIN SET SAVEPOINT Retry. Point; Do. Quick. Unreliable. Updates(…); EXCEPTION WHEN OTHERS THEN ROLLBACK TO Retry. Point; Do. Slow. Reliable. Updates(…); END; © Ellis Cohen 2002 -2005 26

Alternative Path in Java Connection conn = …; Statement stmt = conn. create. Statement(); Do. Useful. Setup(…); try { Savepoint sp. Retry = conn. set. Savepoint( "Retry. Point" ); Do. Quick. Unreliable. Updates(…); } catch( Exception e ) { conn. rollback( sp. Retry ); Do. Slow. Reliable. Updates(…); } © Ellis Cohen 2002 -2005 27

Statement-Level Transactions Every SQL statement executes within a nested transaction A statement can fail – E. g. due to violation of an integrity constraint, e. g. check( enddate > startdate) Result of statement failure: – The statement is rolled back. If an update statement would update 100 records, but updating the 11 th records causes failure of an integrity constraint, the 10 previously updated records are rolled back to their old state – In embedded SQL, it then raises an exception, which can eventually cause the outermost transaction to abort if not caught Result of statement success – Statement-level transaction is released © Ellis Cohen 2002 -2005 28

Autonomous Nested Transactions When a transaction fails, all modifications made during that transaction are undone. That may not be what you want! – Suppose you want to add an audit record (to the Emps. Audit table) every time someone tries to update the Emps table. – You want to add that audit record even if the operation which updates Emps is ultimately rolled back. Solution: Add the audit record inside an autonomous nested transaction. – Autonomous transactions can durably commit inside of a parent transaction – If the parent transaction is aborted after the nested autonomous transaction commits, modifications made inside the autonomous transaction will NOT be undone. © Ellis Cohen 2002 -2005 29

Transactions, Failures & Recovery © Ellis Cohen 2002 -2005 30

Types of Failures Transaction Failure Potentially Violate Transaction aborts for some reasons (e. g. uncaught exception ) System Failure Atomicity Processor / system crash Main memory lost, disk ok Media Failure & Catastrophes Disk or Controller error / Head crash User/Program Errors & Sabotage Durability Loss or corruption of data Consistency © Ellis Cohen 2002 -2005 31

Failures & Recovery Atomicity-Related Failures Return all data changed by a transaction to its state at the beginning of the transaction Durability-Related Failures Depends on keeping a backup of the data Recover the state of the data from the backup Consistency-Related Failures Recover affected data (as for a durability failure) Deal with cascading effects of committed transactions that modified or depended upon incorrect data Very difficult to deal with; won't deal with these in general © Ellis Cohen 2002 -2005 32

Shadow Copying Primitive Recovery Mechanism to Ensure Atomicity & (limited) Durability Assumes 1 transaction at a time Initially there's the Main DB, and db_ptr (also on disk) holds its disk address Then, when a transaction starts A copy of the Main DB is made: Current DB Copy The transaction is executed using Current DB Copy In effect, Main DB becomes a "shadow copy" How is a ROLLBACK handled? db_ptr Main DB Current DB Copy © Ellis Cohen 2002 -2005 33

Failure and Shadow Copying On Commit 1) Force cached pages out to Current DB Copy Crash before (2): As if the transaction never started 2) Change db_ptr to point to Current DB Copy Crash after (2): Transaction state is completely updated 3) Discard the old Main DB db_ptr Main DB Current DB Copy on disk A single atomic operation (changing the db_ptr) moves the system from one consistent state to another © Ellis Cohen 2002 -2005 34

Shadow Copy in Practice - Takes too much space to make a copy of the entire DB - Too slow to make an entire copy of the DB for each transaction Perhaps we will find the ideas of shadow copying useful later on … © Ellis Cohen 2002 -2005 35

Server Page Caching © Ellis Cohen 2002 -2005 36

Disk Structure same size as memory page © Ellis Cohen 2002 -2005 37

Disk Block Organization Divide database into disk blocks (which correspond to memory pages) A block is 1 or more contiguous disk sectors Generally, either A block holds 1 or more complete rows (i. e. tuples), usually from the same table No row straddles a block Block has contiguous rows, or a row directory which keeps track of the offsets of the rows in the block Or (for long rows) A row spans 1 or more blocks (internal chaining) No block holds pieces of 2 or more rows Really large fields (LOB's) are stored separately. © Ellis Cohen 2002 -2005 38

Addressing Tuples 3049625973 Identifies a specific database block Identifies a slot in the block's row directory Every tuple in a database is addressed by a ROWID, which indicates where it may be found © Ellis Cohen 2002 -2005 39

Migration & Forward Chaining An update may increase the size of a tuple so much that it can no longer fit in the same block, so we have to move it to another block. But we want the tuple to still be identified by its ROWID, which refers to the old block The data for the row in the old block holds a forwarding id -- the id of the ROWID for the row in the new block © Ellis Cohen 2002 -2005 40

Block Access & Update To read any row in a block, the block is read into core memory (if not already there) [may also prefetch adjacent blocks] To insert/delete/update a row in a block 1) the block is read into core memory (if needed) 2) the page is modified 3) the page is eventually written back to disk Core Memory (pages) 2 3 Disk Memory (blocks) 1 © Ellis Cohen 2002 -2005 41

Blocks & Pages Frequently, the DB block size (the smallest unit of data transfer between the DB disk memory and core memory) is chosen to be the same size as a virtual memory page We will use the terms page and block interchangeably. © Ellis Cohen 2002 -2005 42

Server Page Caching After a read or update, the page may be cached (i. e. retained) in the DB server's memory. If the page is still in memory next time it is needed, there is no need to read it from disk When the cache is full, room is made for a new page by replacing some other page Most metadata tables are always in the cache Disk Memory Cache Core Memory © Ellis Cohen 2002 -2005 43

Memory & Disk Specs 130 G Disk 512 bytes/sector, 256 sectors/track 65 K tracks/head, 16 heads/disk (8 platters) 1 M tracks/disk, 256 M sectors/disk 10 ms max seek time, 1 ms track-to-track 4 ms avg latency Sustainable data transfer rate: 65 Mbps (4 K bits per sector 60 s / sector) Average time to check 2 K bytes from disk seek + latency + transfer + core check times 0 -10 ms + 4 ms +. 25 ms + 1 s = 4. 25 -14. 25 ms Disk/Core ratio = ~10 ms/1 s = 10, 000: 1 © Ellis Cohen 2002 -2005 44

Page Caching & Virtual Memory • OS allocates DB a fixed (perhaps changeable) # of pages of disk and memory which the DB manages can unnecessarily constrain memory management • Persistent DB state stored in ordinary files, and the page cache is in virtual memory causes duplication of effort if VM page is backed to disk • OS and DB storage management are integrated OS (e. g. Mach) has a file mapping API which can be used by the database © Ellis Cohen 2002 -2005 45

Dirty and Active Pages Dirty Page that has been modified, and has not been written back to disk (since it was modified). A clean page either • Has not been modified since it was read • Has not been modified since it was last written back to disk Active Page that has been accessed by a transaction that has not yet completed (i. e. committed or aborted) © Ellis Cohen 2002 -2005 46

Page States Inactive 1 Clean Dirty Active 2 Same contents as on disk Every transaction that used it has completed Same contents as on disk Some transaction that used it is active 4 3 Page has been modified, but not written back to disk Every transaction that used it has completed Page has been modified, but not written back to disk Some transaction that used it is active Consider the page that has been least recently used. Which of these states could it be in? (Consider the states in the order indicated) © Ellis Cohen 2002 -2005 47

LRU Page States Inactive Clean Dirty Active The transactions which used this page finished a long time ago Any modifications were written out A transaction using this page started a long time ago, but has not yet finished Any modifications were written out The transactions which used this page finished a long time ago Modifications not written out A transaction using this page started a long time ago, but has not yet finished Modifications not written out Why are Dirty Inactive pages a problem for Durability? © Ellis Cohen 2002 -2005 48

Forcing What happens to a dirty page when the transaction which modified it commits? FORCE: It is written back to disk. Necessary for durability unless some other mechanism is available. Effect: No dirty inactive pages NO-FORCE: The page is not written back on commit. Avoids overhead at commit time. If the system crashes after a transaction commits, and the page is not written back, how is durability ensured? © Ellis Cohen 2002 -2005 49

The Replacement Problem What if a page needs to be loaded into memory, but cache memory is full. We need to replace some page with the new page. Which page should we replace? © Ellis Cohen 2002 -2005 50

Replacement Algorithms LRU: Choose the page which has been used least recently. Based on the (often true) notion that pages used most recently will most likely be used again in the near future. Clock Algorithm: Approximates LRU, but is more efficient. Cycle through the pages in order. Choose the next page in order that was not used since that page was considered in the previous cycle. Also, first replace pages read in during full table scans (in fact, if the table is large, throw out earlier pages read when scanning later pages) © Ellis Cohen 2002 -2005 51

Cost of Writing Dirty Pages Suppose the page chosen for replacement is dirty – we need to first write the dirty page back to disk (which impacts performance) – before a newly read page can replace it in the cache Why is this so? Is there a way to improve the performance? © Ellis Cohen 2002 -2005 52

Pre-Write LRU Dirty Pages Use a separate Cleaner Process to find dirty pages which have not been used recently and write them back to disk – The disk scheduler doesn’t need to write them back immediately, but when it is most efficient to do so The dirty page is not immediately replaced, it just becomes clean (instead of dirty). – This allow the replacement algorithm to always find a clean page (not used recently) to replace, without needing to wait for it to be written back But what should the Replacement Algorithm or the Cleaner Process do when it considers an active dirty page? © Ellis Cohen 2002 -2005 53

Stealing STEAL: May choose a dirty active page to clean/replace. – What is the danger if you do NOT write out the page? (What if the transaction that modified the page commits? ) – What is the danger if you DO write out the page? (What if the transaction that modified the page aborted? ) NO-STEAL: Skip over dirty active pages – If there are no clean pages, forces some transaction to abort. – If few clean pages, transactions may thrash (continually reread pages which have recently been replaced) You can always choose a dirty inactive page; just write it out first © Ellis Cohen 2002 -2005 54

Effect of Processor Crash No Modified Pages on Disk Active Transaction No FORCE Committed Transaction As if the transaction never happened Durability Failure Some Modified Pages on Disk All Modified Pages on Disk Atomicity Failure STEAL Atomicity and Durability Failure Transaction saved successfully! Problem even if you FORCE & don't STEAL: Crash in the middle of commit while forcing pages to disk; only some modified pages may be on disk Is there a recovery mechanism based on shadow copying which can solve this problem? © Ellis Cohen 2002 -2005 55

Using Shadow Copies At commit time, we first use shadow copying for all the dirty pages We change the database so it points to those pages instead of the original pages (how do we do that atomically? ) Assuming we can make that work, can we allow page stealing? © Ellis Cohen 2002 -2005 56

Ensuring Atomicity and Durability with Shadow Paging © Ellis Cohen 2002 -2005 57

Page Tables main page table ptr A B C D E F G A Database can be organized using a page table. The table maps the LOGICAL block # (which is used in ROWIDs) to the PHYSICAL blocks # (where the block actually lives on the disk) © Ellis Cohen 2002 -2005 58

Commit-Time Shadow Paging commit-time page table ptr for transaction T main page table ptr A B C D E F G B' D' F' At COMMIT time of transaction T 1. A commit-time copy of the page table is made 2. T's dirty pages (B, D, F) are forced to disk (but DO NOT overwrite the originals), and the commit-time page table copy is changed to point to the new modified copies 3. The main page table ptr is switched to point to the committime page table. THIS IS WHEN THE COMMIT HAPPENS! 4. The old copies of B, D, F and the old page table are freed © Ellis Cohen 2002 -2005 59

Shadow Paging Issues 1. Can we support stealing with shadow paging? 2. The page table is too big to copy on every transaction. How can we improve performance? 3. When a tuple is updated, what pages are changed? © Ellis Cohen 2002 -2005 60

Stolen Page Map T's stolen page map B' If T is using a dirty page that needs to be replaced or cleaned Write it to disk, and note it in T's private stolen page map If T needs to access that page again, look for it the stolen page map before looking in the main page map When T commits, use T's stolen page map to help build T's commit-time page table copy © Ellis Cohen 2002 -2005 B F F' Two transactions are unable to modify (different rows on) the same page. Requires page-level locking (discussed in the next lecture) 61

Multi-Level Page Tables main page table ptr PT 0 PT 1 PT 2 PT 3 P 100 P 101 commit-time page table ptr for transaction T PT 1 … P 199 PT 2 P 400 PT 3 PT 4 P 400 … PT 99 P 9900 P 9901 … P 9999 PT 0 P 401 … P 499 P 400 P 401' P 401 … P 499' PT 4 … PT 99 P 499 P 9999 © Ellis Cohen 2002 -2005 62

Auxiliary Affected Pages What pages are affected when a tuple is updated? • The page containing the original tuple • If the update makes the tuple so large that there is no room for it in the old page, it is moved to a new page (a forwarded page) – If so, the corresponding page of the page table is affected as well. • If any of the updated fields are indexed, then the corresponding index entry for the tuple will have to be moved (e. g. deleted from its old position in the B+ tree, and inserted at the new position). – Both the page of the old entry and the page of the new entry will be affected – Adding a new entry to an index page may cause that page to be split, which will then affect the corresponding page of the page table – Removing an entry from an index page may cause that page to be combined with an adjacent page, which also affects the corresponding page of the page table. • Pages containing the portions of the page table hierarchy used to reference those pages! © Ellis Cohen 2002 -2005 63

Shadow Paging Characteristics Ensures Atomicity & Durability Requires Forcing (dirty pages must be written back at commit time) to ensure durability Allows Stealing (dirty pages can be written back before transaction commits, though not overwritten) Assumes if one transaction modifies a page, no other transaction can read or modify it (i. e. page-level locking) Mechanism Uses a page table (on disk) which keeps a list of all pages in the DB Keeps a shadow copy of each page written Does shadow copying of the page table Main result: At commit time, moves the system instantly from one consistent state to another © Ellis Cohen 2002 -2005 64

Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? – Assume this doesn't happen. – In the next lecture, we will talk about concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? This is a real problem with shadow paging. Either – Allow one transaction at a time to use a page (using page locks), or – Don't actually make the changes to the page until just before commit (using intention lists) © Ellis Cohen 2002 -2005 65

Problems with Shadow Paging • Commit Bottleneck Only one transaction can commit at a time, if we want the page table to be correct (how might this be fixed? ) • Limits on Concurrency Can't have different transactions modify independent parts of pages (could be addressed by deferred modification and intentions lists) • Cost of Shadowing Overhead of allocating and freeing shadow copies • Data Fragmentation For read efficiency, you want logically adjacent data to be kept physically adjacent (e. g. using extents) For write efficiency, this implies in-place updating, not shadow copying (could possibly be addressed by ongoing defragmentation [+ sorting] in the background) © Ellis Cohen 2002 -2005 66

Overview of Logging (the Alternative to Shadow Paging) Main features – Uses a log to support recovery (the log itself may span multiple pages) – No shadowing; Uses in-place updating – Can track modifications at the row (rather than the page) level – No Page Tables, but still depends on Server Page Caching Three approaches Undo-Only Logging (Backward Recovery) Allows stealing, but still requires force on commit Redo-Only Logging (Forward Recovery) Avoids force on commit, but no stealing Combined (Undo/Redo) Logging Avoids force on commit, and allows stealing © Ellis Cohen 2002 -2005 67

Ensuring Atomicity and Durability with Undo Logging © Ellis Cohen 2002 -2005 68

Backward Recovery with Undo Logs Mechanism On every modification made to any tuple in the database, append an Undo Log entry to an Undo Log. On Transaction Abort: use the Undo Log to undo all modifications made by the aborted transaction, in backwards order. Crash Recovery: Abort all uncommitted transactions Characteristics Requires Forcing (dirty pages must be written back at commit time) no way to redo on crash Allows Stealing (dirty pages can be written back before transaction commits) because undo-able All the advantages of shadow paging, with none of the disadvantages © Ellis Cohen 2002 -2005 69

Describing Modifications Suppose the Emps table contains (ROWID) ------3479000 3479001 3479002 3479003 3479004 3479005 3479006 3479007 3479008 3479009 3479010 3479011 3479012 3479013 EMPNO ENAME JOB ----- ---–-- -------7369 SMITH CLERK 7499 ALLEN SALESMAN 7521 WARD SALESMAN 7566 JONES DEPTMGR 7654 MARTIN SALESMAN 7698 BLAKE DEPTMGR 7782 CLARK DEPTMGR 7788 SCOTT ANALYST 7839 KING PRESIDENT 7844 TURNER SALESMAN 7876 ADAMS CLERK 7900 JAMES CLERK 7902 FORD ANALYST 7934 MILLER CLERK MGR ----7902 7698 7839 7566 7698 7788 7698 7566 7782 HIREDATE -----17 -DEC-80 20 -FEB-81 22 -FEB-81 02 -APR-81 28 -SEP-81 01 -MAY-81 09 -JUN-81 19 -APR-87 17 -NOV-81 08 -SEP-81 23 -MAY-87 03 -DEC-81 23 -JAN-82 SAL COMM DEPTNO ------800 20 1600 30 1250 500 30 2975 20 1250 1400 30 2850 30 2450 10 3000 20 5000 10 1500 0 30 1100 20 950 30 3000 20 1300 10 Transaction T 3 executes UPDATE Emps SET sal = sal + 100 WHERE deptno = 10 What changes were made to which tuples? © Ellis Cohen 2002 -2005 70

Tuple Modifications The following changes were made: Tuple 3479006: sal 2450 2550 Tuple 3479008: sal 5000 5100 Tuple 3479013: sal 1300 1400 Suppose: The operation was executed, The pages containing these tuples were modified (in the server page cache) Those pages were written out (due to STEALing) Then Transaction T 3 was ABORTed What is the minimum information we would need to know about the affected tuples to undo the effects of the operation? © Ellis Cohen 2002 -2005 71

Tuple Before State We need to know that: Tuple 3479006: sal was 2450 Tuple 3479008: sal was 5000 Tuple 3479013: sal was 1300 For each tuple that was updated, we need to know what the value was for each modified field before the operation This is the information that is written into the undo log. Many systems write the contents of the entire tuple before the operation – this is called the before image What do we need to know to undo a DELETE or INSERT? © Ellis Cohen 2002 -2005 72

Undoing INSERT & DELETE To undo an INSERT We just need to record the ROWID of the tuple, so we can delete it To undo a DELETE We need to record the ROWID of the tuple plus the entire contents of the tuple, so we can re-insert it! What do we need to record when we do other operations: e. g. CREATE TABLE or DROP TABLE? © Ellis Cohen 2002 -2005 73

Logging System Operations In an RDB, all system state (e. g. which tables are created, what their fields are, etc. ) is stored in Metadata tables. Any system operation (e. g. CREATE TABLE, DROP TABLE) is implementing by modifying the metadata tables. We just log those modifications, just as we log modification to tuples in user tables! © Ellis Cohen 2002 -2005 74

Separate vs Integrated Logging Separate Logging Some systems use a separate undo log for every transaction or for every thread May affect performance if different logs are on different tracks of the same disk BUT: Very easy to abort a single transaction. Just walk backwards through that transaction's log. Every log entry is for the transaction being aborted. Integrated Logging In an integrated log, all log entries are appended to a single log, which interleaves entries from multiple transactions Each entry must identify the associated transaction. To undo a transaction, it is necessary to locate the log entries for that transaction. Typically, each entry points to the previous entries for the same transaction, and there is an entry which identifies the START of the transaction. © Ellis Cohen 2002 -2005 75

Modification Entries for an Integrated Undo Log Executed by transaction T 3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 Transaction, Operation, ROWID, Before State T 3 Insert 3049625973 T 3 Delete 3049218695 Before state 67 'Marketing' T 3 Update 3049218696 Before state 23 'Sales' An UNDO log efficiently stores information needed to restore modified pages to their old state. Just like keeping shadow pages, but more efficient! © Ellis Cohen 2002 -2005 76

Implementing Abort • Traverse the integrated log (starting at the end and going backwards) to find all the entries for that transaction NOTE: this is more efficient if the entries for each transaction are linked together • For each such modification log entry, restore the before state. NOTE: If the page/block the entry refers to was stolen, it will first need to be read back into the cache. Why not find the start of the transaction & undo going forwards? Logs are APPEND-ONLY. This makes them much more efficient to implement. So, Abort does NOT delete the undo entries after using them to implement an abort. What if a different transaction has modified a different tuple on the same page as a change which is undone? © Ellis Cohen 2002 -2005 77

Pages Changed by Multiple Transactions What if the same tuple is changed by two concurrent transactions? What if a tuple modified by the aborted transaction was read by another transaction? – Assume this doesn't happen. – There are separate concurrency control mechanisms which prevent this What if two different tuples on the same page are changed by concurrent transactions? – This is NOT a problem for logging. – Log entries pinpoint a specific tuple on a page, which can be undone leaving other tuples on the same page modified. © Ellis Cohen 2002 -2005 78

Auxiliary Affected Pages Modifying a tuple on a page may cause modification to many other pages – forwarded pages (oversized updates) – index pages – table directory pages (i. e. which pages hold data for a table) Two approaches – Explicit: Add entries to the log for each of these modified pages as well. After all, these represent change that will have to be undone as well. – Implicit: Do not add entries to the log for changes other than to the tuple itself. Changes to other affected pages can be done automatically as part of undoing the change to the tuple. © Ellis Cohen 2002 -2005 79

Physiological Logging Our undo log uses "physiological" log entries • They physically indicate the block of the tuple that was modified (the block # of the ROWID) • They logically provide information needed to restore the tuple to the state prior to the modification To undo an INSERT, you only need the fact that it was an Insert along with its logical position in the block (the slot # of the ROWID), because you will undo the INSERT by freeing the contents of that slot. To undo a DELETE, you need to know all the values of the deleted tuple as well © Ellis Cohen 2002 -2005 80

Write Ahead Logging (WAL) Suppose there is a crash – Before the commit of a transaction is complete – After a page modified by the transaction has been written out (at commit time or due to stealing) Use the undo log to ensure atomicity: undo the changes made to the page But only if the undo log is already on the disk! Write Ahead Logging Before writing out a page, force out the undo log (or at least the parts of the undo log which have entries that refer to that page, implicitly or explicitly). © Ellis Cohen 2002 -2005 81

Transaction Entries for an Integrated Undo Log T# Start appended to the log when transaction T# starts (if a transaction' s entries are linked together, this is not needed; START is implied by an entry with a NULL backwards link) T# Commit. Complete appended to the log after all pages modified by transaction T# have been forced out. T# Abort. Complete appended to the log after all pages which have been undone for transaction T# have been forced out. Log Forcing A COMMIT appends Commit. Complete to the log (after all its modified pages have been written out), and then forces the log out. That's when the COMMIT is actually complete. How are these transactional entries used along with the modification entries to recover from a crash? © Ellis Cohen 2002 -2005 82

Backward Recovery • Traverse the entire log (starting at the end and going backwards) • Skip over a modification entry if – its transaction's Commit. Complete entry has been encountered (all its modified pages have been forced out; it doesn't need to be undone) – its transaction's Abort. Complete entry has been encountered (all pages it modified have already been undone and forced out; they don't need to be undone again) • Otherwise, perform the undo action for the modification entry Why does the entire log have to be traversed? What could you do to avoid that? If a crash occurs in the midst of a transaction, some modifications will be undone that were never persisted. Why is that true? Is that a problem? © Ellis Cohen 2002 -2005 83

Checkpoint Entries When a crash occurs All transactions which have not completed (forced out a Commit. Complete or Abort. Complete entry) must be undone. But a transaction might have started a long, long time ago, made a modification, but not made any other modifications since then. We have to look through the entire log to find entries for these transactions. Solution: Regularly add Checkpoint entries Add a Checkpoint entry to the log at regular intervals with a list of all the active transactions During crash recovery, stop traversing the log when a Checkpoint entry is found where all the active transactions listed have completed (i. e. their Commit. Complete or Abort. Complete entries have already been encountered). How do Start entries allow even earlier stopping? © Ellis Cohen 2002 -2005 84

Undoing Un-persisted Changes Suppose Transaction T 3 executes UPDATE Emps SET sal = sal + 100 WHERE deptno = 10 And the following sequence of events occurs 1. The operation updates tuples with ROWIDs 3479006, 3479008, 3479013 (in the server page cache) 2. UNDO entries for the operation are written to the log 3. The log is forced out 4. The page containing tuple 3479008 is written out 5. The system crashes 6. When the system recovers, it will go through the log and execute the UNDO entries for 3479006, 3479008, 3479013, even though the changes for 3479006 and 3479013 were never persisted. 7. Undo just restores the BEFORE state. If the change being undone was never persisted, at worst this has no effect. (Implicit changes must be handled a little more carefully) If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem? © Ellis Cohen 2002 -2005 85

Idempotence If a crash occurs in the midst of aborting a transaction or recovering from a previous crash, some actions that have already been undone will be undone again. Why is that true? Is that a problem? After undoing some of the actions, the pages of some of the restored tuples could be written back (due to STEALing, as usual). Re-undoing these is not a problem, because, at worst, we are re-restoring the BEFORE state. So UNDO of physiological logs is idempotent. (Doing it additional times has no effect) © Ellis Cohen 2002 -2005 86

Redo Logging © Ellis Cohen 2002 -2005 87

Forward Recovery with Redo Logs Mechanism On every modification made to any tuple in the database, append an Redo Log entry to a Redo Log. On Transaction Abort: Discard pages dirtied by the transaction from the server page cache; Use the Redo Log to redo other modifications made to those pages, Crash Recovery: Use the Redo Log to redo modifications to pages of committed transactions that were not forced to disk. Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able No Stealing (dirty pages CANNOT be written back before transaction commits) since no way to undo on abort © Ellis Cohen 2002 -2005 88

Redo Log Modification Entries Executed by transaction T 3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 Transaction, Operation, ROWID, Before State T 3 Insert 3049625973 After state 30 'Accounting' T 3 Delete 3049218695 T 3 Update 3049218696 After state 23 'Gift' These are physiological log entries © Ellis Cohen 2002 -2005 89

Implementing Abort Invalidate all pages modified by the transaction Starting at the beginning of the integrated log, and traversing forward: Find all log entries for uncommitted transactions that affect the invalidated pages and redo them (as well as implicit changes to auxiliary affected pages) There are ways to speed this up, but still, this can be slow © Ellis Cohen 2002 -2005 90

Transaction Entries for Redo Logs The only transaction log entry needed for a REDO log is Commit T# Commit appended to the log when the a request is made to commit the transaction Log Forcing A COMMIT appends a Commit entry to the log when a commit request is made, and then forces the log out. That's when the COMMIT is actually complete. © Ellis Cohen 2002 -2005 91

Forward Recovery • [Analysis Phase] Traverse the log backwards to find all committed transactions (easier if all Commit entries are linked together) • [Redo Phase] Then traverse the entire log (starting at the beginning and going forwards) – Redo every modification entry of a committed transaction, bringing the necessary block/page into the cache if it is not already there. • This may redo changes which have already been persisted. Not a problem, since redoing a change that was already made cannot hurt. – Redoing an entry makes the modification to the cached page. Since there is no forcing, these will eventually be written to disk just as during regular operation. It is really only necessary to redo modifications made to a page after the page was last persisted. How can this be arranged? © Ellis Cohen 2002 -2005 92

Log Sequence Numbers (LSN's) The entries in the log can be numbered (1, 2, … ). These are called log sequence numbers or LSN's. Every time a page is modified, the LSN of the corresponding log entry is placed in the page, and is written out to disk along with the page. A redo log entry only needs to be redone if its LSN is greater than that of the page it is on. © Ellis Cohen 2002 -2005 93

Unwritten Dirty Pages are never forced out – After a commit, a dirty page can be written out – However, another transaction could start reading it (or might already be reading it), which would prevent it from being written out until that transaction completed. – Using LRU or clock replacement, a dirty page that is continually used might never be written out (We could prevent new transactions from using longtime dirty pages) We have no way of knowing how far back in the log is the first modification made – by a committed transaction – to a page that was not saved, especially if there are no explicit log entries for auxiliary affected pages. That's why we have to start redoing from the very beginning of the log. We'd like to find a way to avoid that © Ellis Cohen 2002 -2005 94

Use Fuzzy Checkpointing At regular intervals, just write a "fuzzy" Checkpoint entry, which includes – a link to the previous checkpoint entry – a list of inactive dirty pages along with the transaction that dirtied each one of them – a list of transactions which have committed since the previous checkpoint Explain crash recovery based on this checkpoint information © Ellis Cohen 2002 -2005 95

Fuzzy Checkpoint Recovery • Traverse backwards through the log to the last checkpoint, keeping track of transactions with Commit entries. • Traverse backward through the checkpoints, adding to the list of committed transactions as you go. • Stop traversing when you get to a checkpoint which has no page/ transaction pairs that match any in the last checkpoint. That's the most recent point at which we know that all active dirty pages were eventually saved. • Start redoing from that point forwards. © Ellis Cohen 2002 -2005 96

Undo/Redo Logging © Ellis Cohen 2002 -2005 97

Undo/Redo Logging Characteristics Forcing Not Required (dirty pages need not be written back at commit time) because redo-able Allows Stealing (dirty pages can be written back before transaction commits) because undo-able Mechanism On every modification made to any tuple in the database, append an Undo/Redo Log entry to an Undo/Redo Log On Transaction Abort: use the Log to undo all modifications made by the aborted transaction, in backwards order Crash Recovery: First Redo all changes to ensure durability, then Undo changes made by uncommitted transactions to ensure atomicity (Aries) © Ellis Cohen 2002 -2005 98

Undo/Redo Log Modification Entries Executed by transaction T 3: INSERT INTO Depts VALUES( 30, 'Accounting') DELETE Depts WHERE deptno = 67 UPDATE Depts SET dname = 'Gift' WHERE deptno = 23 T 3 Insert 3049625973 30 'Accounting' After state T 3 Delete 3049218695 67 'Marketing' Before state T 3 Update 3049218696 23 'Sales' Before state 23 'Gift' After state These are physiological log entries © Ellis Cohen 2002 -2005 99

Logical Log Entries Logical Log Entry: based on OPERATIONS, not tuples Redo Logical Entry: logs the actual SQL statement Undo Logical Entry: logs a compensating SQL statment Undo/Redo Logical Entry: logs both If the SQL statement is INSERT INTO Depts VALUES ( 30, 'Accounting' ) the compensating SQL statement is DELETE FROM Depts WHERE deptno = 30 Logical Log Entries often used for backup, replication, recovery from inconsistency. Can be used cautiously for undo/redo, since SQL statements not generally idempotent. © Ellis Cohen 2002 -2005 100

Ensuring Longer-Term Durability © Ellis Cohen 2002 -2005 101

Storage Stability • Volatile storage Main memory • Semi-stable storage Ordinary disk memory • Stable storage Storage that survives failure – Redundant RAID levels (e. g. Mirroring, Parity) – Relative to degree of failure or catastrophe © Ellis Cohen 2002 -2005 102

Approaches to Ensuring Durability Stable Storage Redundant RAID Levels Non-Local Replication Distributed Replicated Data Archiving Regular (Fuzzy) Backup may be used with local redundant log Remote Logging Send log records to be maintained on a remote machine © Ellis Cohen 2002 -2005 103

Remote Logging Issues Frequency of Sending Changes Continuously At Regular Intervals At Commit Format of Changes Operations (logical redo log entries) Values or Deltas (physiological redo log entries) Commit – Just Communicate Commit (1 -safe) – Jointly Commit (2 -safe) Both are special cases of data replication © Ellis Cohen 2002 -2005 104

Recovery with Remote Backup 1. Use the backup to restore the primary disk (or a hot spare) 2. The backup machine takes over as the primary machine (at least until the primary disk is restored) © Ellis Cohen 2002 -2005 105

Handling Consistency Failure © Ellis Cohen 2002 -2005 106

Enforcing Consistency How do database applications enforce consistency? • Constant Monitoring – Using constraints, assertion, triggers or application code – Prevent/abort operation that lead to inconsistent states, try to correct the problem, or immediately notify the DBA • Interval-Based – At (regular) intervals, check that the system is in a consistent state. If not, correct it, or notify the DBA. • Ignore – Hope that nothing bad happens. If it does, scramble … © Ellis Cohen 2002 -2005 107

Result of Consistency Failures (due to User Error or Sabotage) T 1 It can be quite a while before an erroneous change is discovered T 2 tbl 1 tbl 2 tbl 3 tbl 4 Erroneous change committed Erroneous change discovered Erroneous changes which are discovered later can propagate errors widely © Ellis Cohen 2002 -2005 108

Why is Consistency Failure Recovery Hard? • Need to rollback state from T 2 to T 1 undoing all changes – Use the log to rollback the system to just before the error – Must compensate for external side-effects -- e. g. send report, launch missile • Need to roll forward and redo committed transactions, other than erroneous changes – Can't use physiological log entries, because old/new values may no longer match restored values (from tbl 2 and then propagated elsewhere) – Could use logical log entries, which logs operations done (with parameters and perhaps with system values -- e. g. time) © Ellis Cohen 2002 -2005 109

Operation Levels Using an operation log to roll forwards implies that the DB operations executed would be the same, even if the state were different. UPDATE … COMMIT An application or a user operation contains multiple DB operations (within multiple transactions) and uses the current state to decide which operations to execute. A replayed application might be in a completely different state (since T 1 was not executed) and execute a completely different sequence of DB operations. Rolling forward from T 1 really requires a log of the higher level user operations or applications executed (and even those might differ if the state were different). © Ellis Cohen 2002 -2005 110