Outline Introduction What is a distributed DBMS Problems
Outline Introduction à What is a distributed DBMS à Problems à Current state-of-affairs Distributed DBMS Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Distributed Transaction Management Parallel Database Systems Distributed Object DBMS Database Interoperability Current Issues © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 1
File Systems program 1 data description 1 File 1 program 2 data description 2 File 2 program 3 data description 3 Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez File 3 Page 1. 2
Database Management Application program 1 (with data semantics) Application program 2 (with data semantics) DBMS description manipulation control database Application program 3 (with data semantics) Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 3
DB Clients, Servers, and Environments DB-Server, a collection of programs that execute all DBMS function DB-Client, any application program that needs to connect to a DB-Server DB Environment (DBE), one or more DBs along with any software providing at least minimum set of required data operation and management. Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 4
DBE Architectural concept Service, logical collections of related functionality. Example: Query Service Sites, represents a logical location in an architectural diagram or deployment diagram Component and Subsystem (COS) à Component, Deployable bundle of software that provide reasonability cohesive set of functionality à Subsystem, collection of one or more components that work together toward a common goal Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 5
DBE Architectures Required Services à Data Read Service (Drd-S) à Security Service (Sec-S) à Semantic Integrity Service (Semi. S) Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 6
DBE Architectures Basic Services à Data Read Service (Drd-S) à Security Service (Sec-S) à Semantic Integrity Service (Semi. S) à Data Write Service (Dwr-S) Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 7
DBE Architectures Expected Service à Data Read Service (Drd-S) à Security Service (Sec-S) à Semantic Integrity Service (Semi. S) à Data Write Service (Dwr-S) à Query Request Service (Qreq-S) à Query Optimization Service à Execution Optimization Service Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 8
DBE Architectures Expected Subsystem à Data Read Service (Drd-S) à Security Service (Sec-S) à Semantic Integrity Service (Semi-S) à Data Write Service (Dwr-S) à Query Request Service (Qreq. S) à Query Optimization Service à Execution Optimization Service à User Interface Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 9
DBE Architectures Typical DBMS Service à Drd-S, Sec-S, Semi-S, Dwr-S, Qreq-S à Query Optimization Service à Execution Optimization Service à User Interface à Transaction Management (Trans-S) à Locking Service (Lock-S) à Timestamping Service (Times. S) à Deadlock Handling Service à Fallback and Recovery Service Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 10
Motivation Database Technology Computer Networks integration distribution Distributed Database Systems integration ≠ centralization Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 11
DBMS Schema Architecture Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 12
DDBMS Schema Architecture Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 13
Top Down DDBMS Software Architecture Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 14
Bottom Up DDBMS Software Architecture Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 15
Generic DDBMS architecture Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 16
Distributed Computing A concept in search of a definition and a name. A number of autonomous processing elements (not necessarily homogeneous) that are interconnected by a computer network and that cooperate in performing their assigned tasks. Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 17
Distributed Computing Synonymous terms à distributed function à distributed data processing à multiprocessors/multicomputers à satellite processing à backend processing à dedicated/special purpose computers à timeshared systems à functionally modular systems Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 18
What is distributed … Distributed DBMS Processing logic Functions Data Control © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 19
What is a Distributed Database System? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D–DBMS Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 20
What is not a DDBS? A timesharing computer system A loosely or tightly coupled multiprocessor system A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 21
Centralized DBMS on a Network Site 1 Site 2 Site 5 Communication Network Site 4 Distributed DBMS Site 3 © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 22
Distributed DBMS Environment Site 1 Site 2 Site 5 Communication Network Site 4 Distributed DBMS Site 3 © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 23
Implicit Assumptions Data stored at a number of sites each site logically consists of a single processor. Processors at different sites are interconnected by a computer network no multiprocessors à parallel database systems Distributed database is a database, not a collection of files data logically related as exhibited in the users’ access patterns à relational data model D-DBMS is a full-fledged DBMS à not remote file system, not a TP system Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 24
Shared-Memory Architecture P 1 Pn M D Examples : symmetric multiprocessors (Sequent, Encore) and some mainframes (IBM 3090, Bull's DPS 8) Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 25
Shared-Disk Architecture P 1 Pn M 1 Mn Examples : Distributed DBMS D DEC's VAXcluster, IBM's IMS/VS Data Sharing © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 26
Shared-Nothing Architecture P 1 Pn D 1 M 1 Dn Mn Examples : Teradata's DBC, Tandem, Intel's Paragon, NCR's 3600 and 3700 Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 27
Applications Distributed DBMS Manufacturing - especially multi-plant manufacturing Military command control EFT Corporate MIS Airlines Hotel chains Any organization which has a decentralized organization structure © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 28
Distributed DBMS Promises Transparent management of distributed, fragmented, and replicated data Improved reliability/availability through distributed transactions Improved performance Easier and more economical system expansion Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 29
Transparency is the separation of the higher level semantics of a system from the lower level implementation issues. Fundamental issue is to provide data independence in the distributed environment à Network (distribution) transparency à Replication transparency à Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 30
Example ASG EMP ENO ENAME E 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones TITLE Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. PROJ Distributed DBMS ENO PNO E 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 P 1 P 2 P 3 P 4 P 2 P 4 P 3 P 5 P 3 RESP Manager Analyst Consultant Engineer Programmer Manager Engineer Manager DUR 12 24 6 10 48 18 24 48 36 23 40 PAY PNO PNAME BUDGET TITLE SAL P 1 P 2 P 3 P 4 Instrumentation Database Develop. CAD/CAM Maintenance 150000 135000 250000 310000 Elect. Eng. Syst. Anal. Mech. Eng. Programmer 40000 34000 27000 24000 © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 31
Transparent Access SELECT FROM WHERE AND ENAME, SAL EMP, ASG, PAY DUR > 12 EMP. ENO = ASG. ENO PAY. TITLE = EMP. TITLE Tokyo Paris Boston Communication Network Paris projects Paris employees Paris assignments Boston employees Boston projects Boston employees Boston assignments Montreal New York Boston projects New York employees New York projects New York assignments Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Montreal projects Paris projects New York projects with budget > 200000 Montreal employees Montreal assignments Page 1. 32
Distributed Database - User View Distributed Database Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 33
Distributed DBMS - Reality DBMS Software User Query User Application DBMS Software Communication Subsystem User Query DBMS Software User Application User Query Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 34
Potentially Improved Performance Proximity of data to its points of use à Requires some support for fragmentation and replication Parallelism in execution à Inter-query parallelism à Intra-query parallelism Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 35
Parallelism Requirements Have as much of the data required by each application at the site where the application executes à Full replication How about updates? à Updates to replicated data requires implementation of distributed concurrency control and commit protocols Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 36
System Expansion Issue is database scaling Emergence of microprocessor and workstation technologies à Demise of Grosh's law à Client-server model of computing Data communication cost vs telecommunication cost Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 37
Distributed DBMS Issues Distributed Database Design à how to distribute the database à replicated & non-replicated database distribution à a related problem in directory management Query Processing à convert user transactions to data manipulation instructions à optimization problem à min{cost = data transmission + local processing} à general formulation is NP-hard Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 38
Distributed DBMS Issues Concurrency Control à synchronization of concurrent accesses à consistency and isolation of transactions' effects à deadlock management Reliability à how to make the system resilient to failures à atomicity and durability Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 39
Relationship Between Issues Directory Management Query Processing Distribution Design Reliability Concurrency Control Deadlock Management Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 40
Related Issues Operating System Support à operating system with proper support for database operations à dichotomy between general purpose processing requirements and database processing requirements Open Systems and Interoperability à Distributed Multidatabase Systems à More probable scenario à Parallel issues Distributed DBMS © 2001 M. Tamer Özsu & Patrick Valduriez Page 1. 41
- Slides: 41