Concurrency Control in Distributed Databases By Rishikesh Mandvikar
Concurrency Control in Distributed Databases. By : Rishikesh Mandvikar rmandvik[at]engr. smu. edu May 1, 2004
Topics n Serializability Theory ¨ Centralized Databases ¨ Distributed Databases n Lock Based Concurrency Control Algorithms ¨ Centralized (2 PL, S 2 PL) ¨ Distributed (C 2 PL, PC 2 PL, D 2 PL) n Optimistic Concurrency Control 2
Serializability Theory [13] 3
Serializability Theory [13] 4
Serializability Theory extended to Distributed Database [14] n Fragmentation ¨ Horizontal ¨ Vertical ¨ Hybrid n Replication ¨ Synchronous Replication n ROWA Protocol n Voting ¨ Asynchronous Replication 5
Classification of CC Algorithms [14] 6
Locking based CC Algorithms n Centralized ¨ 2 PL (Relaxed S 2 PL) ¨ S 2 PL n Distributed ¨ C 2 PL ¨ PC 2 PL ¨ D 2 PL 7
2 Phase locking (2 PL) [13] Rules: Growing phase: ¨ “A txn that has to read/write a data object first has to request a read/write lock on it. ” Shrinking phase: ¨ “A txn cant request additional locks once it releases a lock. ” 8
Lock Graph for 2 PL 9
Strict 2 Phase Locking (S 2 PL) [13] Rules: Growing phase: ¨ “A txn that has to read/write a data object first has to request a read/write lock on it. ” Non - Shrinking phase: ¨ “Txn releases all locks only when it completes. ” 10
Lock Graph for S 2 PL 11
2 PL, S 2 PL [13] 12
2 PL, S 2 PL n Differences ¨ 2 PL n Cascading aborts n Conflict serializable schedules (not all) n High concurrency ¨ S 2 PL n No cascading aborts n Serializable schedules n Low concurrency 13
Centralized 2 PL 14
Centralized 2 PL [14] n Cons ¨Failure of primary site ¨Bottleneck situation ¨Communication links 15
Primary Copy 2 PL [14] n Lock on primary copy necessary n Lock management at the primary-copy sites only n Pros ¨ Reduces load at central site n Cons ¨ Deadlock handling is partially centralized 16
Distributed 2 PL [14] 17
Distributed 2 PL [14] n Pros ¨ Lock management independency n Cons ¨ Complex deadlock handling required ¨ Communication cost 18
Optimistic Concurrency Control [13][14] ¨Txns assumed to have no conflicts ¨Private workspace area ¨Validation of txns before write phase 19
Optimistic Concurrency Control [13][14] n Txn phases: ¨ Read n and Compute read from database and write into private workspace ¨ Validate Timestamps assigned over here n Check for conflict with concurrent txns n ¨ Write n Copy into database if validation successful 20
Optimistic Concurrency Control [13][14] For Ti and Tj where TS(Ti) < TS(Tj) n Validation Criteria ¨ All phases of Ti execute before Tj ¨ Ti ends before write phase of Tj and Ti doesn’t modify data read by Tj ¨ Ti finishes its read phase before Tj finishes its read phase and they both don’t read/write any common data 21
Optimistic Concurrency Control [13][14] n Validation ¨ For validating Tj w. r. t committed txn Ti where TS(Ti) < TS(Tj) Maintain a list of read/write object list for Tj n Other cant commit while Tj is validated n Once Validated, write phase allowed to finish n Bottleneck situation n 22
Optimistic Concurrency Control [13][14] n Advantages ¨ Increased concurrency with a good “mix” of txns. ¨ Better than Lock based systems n Disadvantages ¨ Bottleneck situation ¨ Maintaining read/write list for every txn ¨ Copying the private space to the database ¨ Long txns 23
Optimistic Concurrency Control [13][14] n Disadvantages ¨ Long txns Read/write list would be very long n Chance of Restart is proportional to the square of its size n [9] 24
Research n Optimistic CC algorithm ¨ IBM’s IMS FASTPATH (Centralized DBMS) ¨ OCC in Distributed DBMS 25
Conclusion Serializability Theory n Lock Based Systems n Optimistic CC algorithms n n Timestamp Ordering 26
Questions? ? 27
- Slides: 27