Outline Introduction Background Distributed DBMS Architecture Distributed Database

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design à Fragmentation Distributed DBMS à Data Location Distributed Query Processing (Briefly) Distributed Transaction Management (Extensive) Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 1

Useful References W. W. Chu, Optimal File Allocation in Multiple Computer System, IEEE Transaction on Computers, 885 -889, October 1969. Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 2

Allocation Alternatives Non-replicated à partitioned : each fragment resides at only one site Replicated à fully replicated : each fragment at each site à partially replicated : each fragment at some of the sites Rule of thumb: read - only queries If update queries 1 replication is advantageous, otherwise replication may cause problems Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 3

Comparison of Replication Alternatives Full-replication QUERY PROCESSING Partial-replication Partitioning Easy Same Difficulty DIRECTORY MANAGEMENT Easy or Non-existant Same Difficulty CONCURRENCY CONTROL Moderate Difficult Easy RELIABILITY Very high High Low REALITY Possible application Realistic Possible application Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 4

Information Requirements Four categories: à Database information à Application information à Communication network information à Computer system information Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 5

Fragment Allocation Problem Statement Given F = {F 1, F 2, …, Fn} S ={S 1, S 2, …, Sm} Q = {q 1, q 2, …, qq} fragments network sites applications Find the "optimal" distribution of F to S. Optimality à Minimal cost Communication + storage + processing (read & update) Cost in terms of time (usually) à Performance Response time and/or throughput à Constraints Distributed DBMS Per site constraints (storage & processing) © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 6

Information Requirements Database information à selectivity of fragments à size of a fragment Application information à access types and numbers à access localities Computer system information à unit cost of storing data at a site à unit cost of processing at a site Communication network information à bandwidth à latency à communication overhead Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 7

Allocation File Allocation (FAP) vs Database Allocation (DAP): à Fragments are not individual files relationships have to be maintained à Access to databases is more complicated remote file access model not applicable relationship between allocation and query processing à Cost of integrity enforcement should be considered à Cost of concurrency control should be considered Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 8

Allocation – Information Requirements Database Information à selectivity of fragments à size of a fragment Application Information à à à number of read accesses of a query to a fragment number of update accesses of query to a fragment A matrix indicating which queries updates which fragments A similar matrix for retrievals originating site of each query Site Information à unit cost of storing data at a site à unit cost of processing at a site Network Information à communication cost/frame between two sites à frame size Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 9

Allocation Model General Form min(Total Cost) subject to response time constraint storage constraint processing constraint Decision Variable xij Distributed DBMS 1 0 if fragment Fi is stored at site Sj otherwise © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 10

Allocation Model Total Cost all queries query processing cost all sites all fragments cost of storing a fragment at a site Storage Cost (of fragment Fj at Sk) (unit storage cost at Sk) (size of Fj) xjk Query Processing Cost (for one query) processing component + transmission component Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 11

Allocation Model Query Processing Cost Processing component access cost + integrity enforcement cost + concurrency control cost à Access cost all sites all fragments (no. of update accesses+ no. of read accesses) xij local processing cost at a site à Integrity enforcement and concurrency control costs Can Distributed DBMS be similarly calculated © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 12

Allocation Model Query Processing Cost Transmission component cost of processing updates + cost of processing retrievals à Cost of updates all sites all fragments all sites update message cost all fragments acknowledgment cost à Retrieval Cost all fragments min all sites (cost of retrieval command cost of sending back the result) Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 13

Allocation Model Constraints à Response Time execution time of query ≤ max. allowable response time for that query à Storage Constraint (for a site) all fragments storage requirement of a fragment at that site storage capacity at that site à Processing constraint (for a site) all queries processing load of a query at that site processing capacity of that site Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 14

Allocation Model Solution Methods à FAP is NP-complete à DAP also NP-complete Heuristics based on à single commodity warehouse location (for FAP) à knapsack problem à branch and bound techniques à network flow Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 15

Allocation Model Attempts to reduce the solution space à assume all candidate partitionings known; select the “best” partitioning à ignore replication at first à sliding window on fragments Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 16
- Slides: 16