10 8 Distributed Database Transparency Features Allow end

10. 8 Distributed Database Transparency Features • Allow end user to feel like database’s only user • Features include: – Distribution transparency • Partition, Replication, Location – Transaction transparency • Maintain ACID properties – Failure transparency – Performance transparency • No performance degradation – Heterogeneity transparency Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10

10 10. 9 Distribution Transparency • Allows management of a physically dispersed database as though it were a centralized database • Three levels of distribution transparency are recognized: – Fragmentation transparency • Supported by Distributed Data Catalog (DDC) • Distributed global scheme is used by local TPs to translate user requests into subqueries (remote) that will be processed by different DPs – Location transparency – Local mapping transparency Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 A Summary of Transparency Features IF THE SQL Statement Requires: Why Not Considered: No Yes? Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 3

10 Fragment Locations Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 4

To list all the employees with date-of-birth prior 10 to January 1, 1940 1. The DB supports Fragment Transparency SELECT * FROM EMPLOYEE WHERE EMP_DOB < ’ 01 -JAN-1940’; 2. The DB supports Location Transparency SELECT * FROM E 1 WHERE EMP_DOB < ’ 01 -JAN-1940’ UNION SELECT * FROM E 2 WHERE EMP_DOB < ’ 01 -JAN-1940’; UNION SELECT * FROM E 3 WHERE EMP_DOB < ’ 01 -JAN-1940’; 3. The DB supports Local Mapping Transparency SELECT * FROM E 1 NODE NY WHERE EMP_DOB < ’ 01 -JAN-1940’ UNION SELECT * FROM E 2 NODE ATL WHERE EMP_DOB < ’ 01 -JAN-1940’; UNION SELECT * FROM E 3 NODE MIA WHERE EMP_DOB < ’ 01 -JAN-1940’; Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 10. 10 Transaction Transparency • Ensures database transactions will maintain distributed database’s integrity and consistency • Transaction transparency ensures that the transaction will be completed only if all database sites involved in the transaction complete their part of the transaction Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Distributed Requests and Distributed Transactions 10 • Distributed transaction – Can update or request data from several different remote sites on a network • Remote request – Lets a single SQL statement access data to be processed by a single remote database processor • Remote transaction – Composed of several requests, Accesses data at a single remote site Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Distributed Requests and Distributed Transactions (continued) • Distributed transaction – Allows a transaction to reference several different (local or remote) DP sites • Distributed request – Lets a single SQL statement reference data located at several different local or remote DP sites Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

A Remote Request Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10 9

A Remote Transaction Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10 10

A Distributed Transaction Each request can access only one remote site a time Database Systems: Design, Implementation, & Management, 6 Edition, Rob at & Coronel th 10 11

A Distributed Request Each request can access data from several&local or remote sites at a time Database Systems: Design, Implementation, Management, 6 Edition, Rob & Coronel th 10 12

Another Distributed Request Full fragmentation transparency support is provided by a DBMS that supports distributed requests Database Systems: Design, Implementation, & Management, 6 Edition, Rob & Coronel th 10 13

10 Distributed Concurrency Control • Multi-site, multiple-process operations are much more likely to create data inconsistencies and deadlocked transactions than are single-site systems Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

The Effect of a Premature COMMIT Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10 15

10 Two-Phase Commit Protocol • Distributed databases make it possible for a transaction to access data at several sites • Final COMMIT must not be issued until all sites have committed their parts of the transaction • Two-phase commit protocol requires each individual DP’s transaction log entry be written before the database fragment is actually updated Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Two-Phase Commit Protocol l Phase 1: Preparation 1. The coordinator sends a PREPARE TO COMMIT message to all subordinates 2. The subordinates receive the message, write the transaction log using the write-ahead protocol, and send an acknowledgement (YES or NO) message to the coordinator 3. The coordinator makes sure all nodes are ready to commit (acknowledged with YES), otherwise it aborts the transaction l If all nodes are YES (Prepared to COMMIT), the transaction goes to Phase 2. Otherwise, the coordinator broadcasts an ABORT message to all subordinates Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Two-Phase Commit Protocol 10 l Phase 2: The Final COMMIT 1. The coordinator broadcasts a COMMIT message to all subordinates and waits for the replies 2. Each subordinate receives the COMMIT message, then updates the database using the DO protocol 3. The subordinates reply with a COMMITTED or NOT COMMITTED message to the coordinator l If any subordinate did not commit, the coordinator sends an ABORT message, thereby forcing them to UNDO all changes Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 10. 11 Performance Transparency and Query Optimization • Objective of query optimization routine is to minimize total cost associated with the execution of a request • Costs associated with a request are a function of the: – Access time (I/O) cost – Communication cost – CPU time cost Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Performance Transparency and Query Optimization 10 • Must provide distribution transparency as well as replica transparency • Replica transparency: – DDBMS’s ability to hide the existence of multiple copies of data from the user • Query optimization algorithms are based on two principles: – The selection of the optimum execution order – The selection of sites to be accessed to minimize the communication costs Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Performance Transparency and Query Optimization • Query optimization techniques: – Manual or automatic – Static or dynamic • Static query optimization takes place at compilation time • Dynamic query optimization takes place at execution time – Statistically (dynamic or manual) based or rulebased algorithms Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 10. 12 Distributed Database Design • Data fragmentation: – How to partition the database into fragments • Data replication: – Which fragments to replicate • Data allocation: – Where to locate those fragments and replicas Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Data Fragmentation • Breaks single object into two or more segments or fragments • Each fragment can be stored at any site over a computer network • Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Data Fragmentation Strategies • Horizontal fragmentation: – Division of a relation into subsets (fragments) of tuples (rows) • Vertical fragmentation: – Division of a relation into attribute (column) subsets • Mixed fragmentation: – Combination of horizontal and vertical strategies Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 A Sample CUSTOMER Table Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 25

10 Horizontal Fragmentation of the CUSTOMER Table by State Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 26

Table Fragments in Three Locations Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10 27

10 Vertically Fragmented Table Contents Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 28

Mixed Fragmentation of the CUSTOMER Table Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 10 29

10 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 30

10 Data Replication • Storage of data copies at multiple sites served by a computer network • Fragment copies can be stored at several sites to serve specific information requirements – Can enhance data availability and response time – Can help to reduce communication and total query costs Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Data Replication 10 mutual consistency rule: all copies of data fragments be identical Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 32

10 The Replica Overhead Imposed on DDBMS • The processes that the DDBMS must perform to use the database replication – If the DB is fragmented, the DDBMS must decompose a query into sub-queries to access the appropriate fragment – If the DB is replicated, the DDBMS must decide which copy to access – The TP sends a data request to each selected DP for execution – The DP receives and executes each request and sends the data back to TP – The TP assembles the DP responses Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Replication Scenarios 10 • Fully replicated database: – Stores multiple copies of each database fragment at multiple sites – Can be impractical due to amount of overhead • Partially replicated database: – Stores multiple copies of some database fragments at multiple sites – Most DDBMSs are able to handle the partially replicated database well • Unreplicated database: – Stores each database fragment at a single site – No duplicate database fragments • Factors influence the decision to use data replication – DB size, Usage frequency, Costs Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

Data Allocation 10 • Deciding where to locate data • Allocation strategies: – Centralized data allocation • Entire database is stored at one site – Partitioned data allocation • Database is divided into several disjointed parts (fragments) and stored at several sites – Replicated data allocation • Copies of one or more database fragments are stored at several sites • Data distribution over a computer network is achieved through data partition, data replication, or a combination of both Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 10. 13 Client/Server vs. DDBMS • Client/Server architecture refers to the way in which computers interact to form a system • Features a user of resources, or a client, and a provider of resources, or a server • Can be used to implement a DBMS in which the client is the TP and the server is the DP Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Client/Server Advantages • Less expensive than alternate minicomputer or mainframe solutions • Allow end user to use microcomputer’s GUI, thereby improving functionality and simplicity • More people with PC skills than with mainframe skills in the job market • PC is well established in the workplace • Numerous data analysis and query tools exist to facilitate interaction with DBMSs available in the PC market • Considerable cost advantage to offloading applications development from the mainframe to powerful PCs Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Client/Server Disadvantages • Creates a more complex environment, in which different platforms (LANs, operating systems, and so on) are often difficult to manage • An increase in the number of users and processing sites often paves the way for security problems • Possible to spread data access to a much wider circle of users increases demand for people with broad knowledge of computers and software increases burden of training and cost of maintaining the environment Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 10. 14 C. J. Date’s Twelve Commandments for Distributed Databases 1. Local site independence 2. Central site independence 3. Failure independence 4. Location transparency 5. Fragmentation transparency 6. Replication transparency 7. Distributed query processing 8. Distributed transaction processing 9. Hardware independence 10. Operating system independence 11. Network independence 12. Database independence Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Summary • Distributed database stores logically related data in two or more physically independent sites connected via a computer network • Database is divided into fragments • Distributed databases require distributed processing • Main components of a DDBMS are the transaction processor and the data processor Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel

10 Summary (continued) • Current database systems can be classified by extent to which they support processing and data distribution • DDBMS characteristics are best described as a set of transparencies • A transaction is formed by one or more database requests • A database can be replicated over several different sites on a computer network • Client/server architecture refers to the way in which two computers interact over a computer network to form a system Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel
- Slides: 41