Distributed databases What is a Distributed Database System
Distributed databases
What is a Distributed Database System? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D –DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D –DBMS
What is not a DDBS? § § § A timesharing computer system A loosely or tightly coupled multiprocessor system A database system which resides at one of the nodes of a network of computers this is a centralized database on a network node
Distributed DBMS 5
Distributed Processing § A centralized database that can be accessed over a computer network. 6
Parallel DBMS § § § A DBMS running across multiple processors and disks designed to execute operations in parallel, whenever possible, to improve performance. Based on premise that single processor systems can no longer meet requirements for cost-effective scalability, reliability, and performance. Parallel DBMSs link multiple, smaller machines to achieve same throughput as single, larger machine, with greater scalability and reliability.
Parallel DBMS § Main architectures for parallel DBMSs are: ða: ðb: ðc: Shared memory. Shared disk. Shared nothing.
Parallel DBMS 9
DDBMS Advantages § § § § § Data are located near “greatest demand” site Faster data access Faster data processing Growth facilitation Improved communications Reduced operating costs User-friendly interface Less danger of a single-point failure Processor independence Distributed Databases 9
DDBMS Disadvantages § § § Complexity of management and control Security Lack of standards Increased storage requirements Greater difficulty in managing the data environment Increased training cost Distributed Databases 10
Types of DDBMS § § Homogeneous DDBMS Heterogeneous DDBMS
Homogeneous DDBMS § § § All sites use same DBMS product. Much easier to design and manage. Approach provides incremental growth and allows increased performance.
Heterogeneous DDBMS § § § Sites may run different DBMS products, with possibly different underlying data models. Occurs when sites have implemented their own databases and integration is considered later. Translations required to allow for: ðDifferent hardware. ðDifferent DBMS products. ðDifferent hardware and different DBMS products. § Typical solution is to use gateways.
Distributed Database Design § Three key issues: ðFragmentation. ðAllocation ðReplication
Distributed Database Design § Fragmentation ðRelation may be divided into a number of subrelations, which are then distributed. § Allocation ðEach fragment is stored at site with "optimal" distribution. § Replication ðCopy of fragment may be maintained at several sites.
Fragmentation § Definition and allocation of fragments carried out strategically to achieve: ðLocality of Reference ðImproved Reliability and Availability ðImproved Performance ðBalanced Storage Capacities and Costs ðMinimal Communication Costs. § Involves analyzing most important applications, based on quantitative/qualitative information.
Data Allocation § Four alternative strategies regarding placement of data: ðCentralized ðPartitioned (or Fragmented) ðComplete Replication ðSelective Replication
Data Allocation § Centralized ðConsists of single database and DBMS stored at one site with users distributed across the network. § Partitioned ðDatabase partitioned into disjoint fragments, each fragment assigned to one site.
Data Allocation § Complete Replication ðConsists of maintaining complete copy of database at each site. § Selective Replication ðCombination of partitioning, replication, and centralization.
Why Fragment? § Usage ðApplications work with views rather than entire relations. § Efficiency ðData is stored close to where it is most frequently used. ðData that is not needed by local applications is not stored.
Why Fragment? § Parallelism ðWith fragments as unit of distribution, transaction can be divided into several subqueries that operate on fragments. § Security ðData not required by local applications is not stored and so not available to unauthorized users. § Disadvantages ðPerformance ðIntegrity.
Data Fragmentation § § § Breaks single object into two or more segments or fragments Each fragment can be stored at any site over a computer network Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests Distributed Databases 22
Data Fragmentation Strategies § Horizontal fragmentation: ðDivision of a relation into subsets (fragments) of tuples (rows) § Vertical fragmentation: ðDivision of a relation into attribute (column) subsets § Mixed fragmentation: ðCombination of horizontal and vertical strategies Distributed Databases 23
Horizontal and Vertical Fragmentation 41
Mixed Fragmentation
Data Replication Distributed Databases 26
Replication Scenarios § Fully replicated database: ð Stores multiple copies of each database fragment at multiple sites ð Can be impractical due to amount of overhead § Partially replicated database: ð Stores multiple copies of some database fragments at multiple sites ð Most DDBMSs are able to handle the partially replicated database well § Unreplicated database: ð Stores each database fragment at a single site ð No duplicate database fragments ð Database size, usage frequency and costs (performance, overhead, management) influence the decision to replicate
Queries ? ?
- Slides: 28