Outline Introduction Background Distributed Database Design Database Integration
Outline • • • Introduction Background Distributed Database Design Database Integration Semantic Data Control Distributed Query Processing ➡ Overview ➡ Query decomposition and localization • • ➡ Distributed query optimization Multidatabase query processing Distributed Transaction Management Data Replication Parallel Database Systems Distributed Object DBMS Peer-to-Peer Data Management Web Data Management Current Issues Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/1
Query Processing in a DDBMS high level user query processor Low-level data manipulation commands for D-DBMS Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/2
Distributed Query Processing Methodology Calculus Query on Distributed Relations Query Decomposition GLOBAL SCHEMA Algebraic Query on Distributed Relations CONTROL SITE Data Localization FRAGMENT SCHEMA Fragment Query Global Optimization STATS ON FRAGMENTS Optimized Fragment Query with Communication Operations LOCAL SITES Local Optimization LOCAL SCHEMAS Optimized Local Queries Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/3
Step 1 – Query Decomposition Input : Calculus query on global relations • • Normalization ➡ manipulate query quantifiers and qualification Analysis ➡ detect and reject “incorrect” queries • • ➡ possible for only a subset of relational calculus Simplification ➡ eliminate redundant predicates Restructuring ➡ calculus query algebraic query ➡ more than one translation is possible ➡ use transformation rules Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/4
Normalization • Lexical and syntactic analysis ➡ check validity (similar to compilers) ➡ check for attributes and relations ➡ type checking on the qualification • Put into normal form ➡ Conjunctive normal form (p 11 p 12 … p 1 n) … (pm 1 pm 2 … pmn) ➡ Disjunctive normal form (p 11 p 12 … p 1 n) … (pm 1 pm 2 … pmn) ➡ OR's mapped into union ➡ AND's mapped into join or selection Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/5
Analysis • • Refute incorrect queries Type incorrect ➡ If any of its attribute or relation names are not defined in the global schema • ➡ If operations are applied to attributes of the wrong type Semantically incorrect ➡ Components do not contribute in any way to the generation of the result ➡ Only a subset of relational calculus queries can be tested for correctness ✦ Those that do not contain disjunction and negation ➡ To detect connection graph (query graph) ✦ join graph ✦ Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/6
Analysis – Example SELECT FROM WHERE AND AND ENAME, RESP EMP, ASG, PROJ EMP. ENO = ASG. ENO ASG. PNO = PROJ. PNO PNAME = "CAD/CAM" DUR ≥ 36 TITLE = "Programmer" Query graph Join graph DUR≥ 36 ASG EMP. ENO=ASG. ENO TITLE = “Programmer” EMP ENAME Distributed DBMS RESP RESULT ASG. PNO=PROJ. PNO EMP. ENO=ASG. ENO PROJ EMP ASG. PNO=PROJ. PNO PROJ PNAME=“CAD/CAM” © M. T. Özsu & P. Valduriez Ch. 7/7
Analysis If the query graph is not connected, the query may be wrong or use Cartesian product SELECT FROM WHERE AND AND ENAME, RESP EMP, ASG, PROJ EMP. ENO = ASG. ENO PNAME = "CAD/CAM" DUR > 36 TITLE = "Programmer" ASG EMP ENAME Distributed DBMS RESP RESULT © M. T. Özsu & P. Valduriez PROJ PNAME=“CAD/CAM” Ch. 7/8
Simplification • Why simplify? ➡ Remember the example • How? Use transformation rules ➡ Elimination of redundancy ✦ idempotency rules p 1 ¬( p 1) false p 1 (p 1∨p 2) p 1 false p 1 … ➡ Application of transitivity ➡ Use of integrity rules Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/9
Simplification – Example SELECT FROM WHERE OR AND TITLE EMP. ENAME = "J. Doe" (NOT(EMP. TITLE = "Programmer") (EMP. TITLE = "Programmer" EMP. TITLE = "Elect. Eng. ") NOT(EMP. TITLE = "Elect. Eng. ")) SELECT FROM WHERE Distributed DBMS TITLE EMP. ENAME = "J. Doe" © M. T. Özsu & P. Valduriez Ch. 7/10
Restructuring • • • Convert relational calculus to relational algebra Make use of query trees Example ENAME Project σDUR=12 OR DUR=24 Find the names of employees other than σ Select PNAME=“CAD/CAM” J. Doe who worked on the CAD/CAM project for either 1 or 2 years. SELECT ENAME σENAME≠“J. DOE” FROM EMP, ASG, PROJ WHERE EMP. ENO = ASG. ENO ⋈PNO AND ASG. PNO = PROJ. PNO AND ENAME≠ "J. Doe" ⋈ENO Join AND PNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24) PROJ ASG EMP Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/11
Restructuring –Transformation Rules • Commutativity of binary operations ➡R×S S×R ➡ R ⋈S S ⋈R ➡R S S R • Associativity of binary operations ➡ ( R × S) × T R × (S × T) ➡ (R ⋈S) ⋈T R ⋈ (S ⋈T) • Idempotence of unary operations ➡ A’(R)) A’(R) ➡ p 1(A 1) ( p 2(A 2)(R)) p 1(A 1) p 2(A 2)(R) where R[A] and A' A, A" A and A' A" • Commuting selection with projection Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/12
Restructuring – Transformation Rules • Commuting selection with binary operations ➡ p(A)(R × S) ( p(A) (R)) × S ➡ p(A )(R ⋈(A , B )S) ( p(A ) (R)) ⋈(A , B )S i j k ➡ p(A )(R T) p(A ) (R) p(A ) (T) i i i where Ai belongs to R and T • Commuting projection with binary operations ➡ C(R × S) A’(R) × B’(S) ➡ C(R ⋈(A , B )S) A’(R) ⋈(A , B ) B’(S) j k ➡ C(R S) C(R) C(S) where R[A] and S[B]; C = A' B' where A' A, B' B Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/13
Example ENAME Recall the previous example: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either one or two years. SELECT ENAME FROM DUR=12 DUR=24 PNAME=“CAD/CAM” Select ENAME≠“J. DOE” PROJ, ASG, EMP WHERE ASG. ENO=EMP. ENO ⋈PNO AND ASG. PNO=PROJ. PNO AND ENAME ≠ "J. Doe" AND PROJ. PNAME="CAD/CAM" AND (DUR=12 OR DUR=24) Distributed DBMS Project ⋈ENO PROJ © M. T. Özsu & P. Valduriez ASG Join EMP Ch. 7/14
Equivalent Query ENAME PNAME=“CAD/CAM” (DUR=12 DUR=24) ENAME≠“J. Doe” ⋈PNO, ENO × EMP Distributed DBMS PROJ © M. T. Özsu & P. Valduriez ASG Ch. 7/15
Restructuring ENAME ⋈PNO PNO, ENAME ⋈ENO PNAME = "CAD/CAM" PROJ Distributed DBMS PNO, ENO DUR =12 DUR=24 ASG © M. T. Özsu & P. Valduriez PNO, ENAME ≠ "J. Doe" EMP Ch. 7/16
Step 2 – Data Localization Input: Algebraic query on distributed relations • • Determine which fragments are involved Localization program ➡ substitute for each global query its localized query ✦ A localized query is a relational algebra query whose operands are the fragments of relations instead of the relations themselves ✦ We call these operands that are fragments of relations “localization programs” ✓ ✦ Union for horizontal fragmentation; Join for vertical fragmentation Replication is not taken into account in this chapter ➡ Optimize ✦ For each type of fragmentation, use reduction techniques to generate simpler queries ✦ To do so, use appropriate heuristics Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/17
Example ENAME Assume ➡ EMP is fragmented into EMP 3 as follows: EMP 1, EMP 2, PNAME=“CAD/CAM” ✦ EMP 1= ENO≤“E 3”(EMP) ✦ EMP 2= “E 3”<ENO≤“E 6”(EMP) ✦ EMP 3= ENO≥“E 6”(EMP) ➡ ASG fragmented into as follows: DUR=12 DUR=24 ENAME≠“J. DOE” ✦ ASG 1= ENO≤“E 3”(ASG) ✦ ASG 2= ENO>“E 3”(ASG) ⋈ENO PROJ Replace EMP by (EMP 1 EMP 2 EMP 3) and ASG by (ASG 1 ASG 2) in any query Distributed DBMS ⋈PNO ASG 1 and ASG 2 © M. T. Özsu & P. Valduriez EMP 1 EMP 2 EMP 3 ASG 1 ASG 2 Ch. 7/18
Reduction for PHF • Reduction with selection ➡ Relation R and FR={R 1, R 2, …, Rw} where Rj= p (R) j pi(Rj)= if x in R: ¬(pi(x) pj(x)) ➡ Example SELECT * FROM EMP WHERE ENO="E 5" ENO=“E 5” EMP 1 Distributed DBMS EMP 2 EMP 3 © M. T. Özsu & P. Valduriez EMP 2 Ch. 7/19
Reduction for PHF • Reduction with join ➡ Possible if fragmentation is done on join attribute, i. e. , the selection attribute used for the fragmentation is the same as the join attribute ➡ Algorithm ✦ Distribute joins over unions (R 1 R 2)⋈S (R 1⋈S) (R 2⋈S) ✦ Eliminate useless joins as follows: Given Ri = pi(R) and Rj = pj(R) Ri ⋈Rj = if x in Ri, y in Rj: ¬(pi(x) pj(y)) That is, qualifications of the joined fragments are in contradiction Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/20
Reduction for PHF • ⋈ENO Assume EMP is fragmented as before and ➡ ASG 1: ENO ≤ "E 3"(ASG) ➡ ASG 2: ENO > "E 3"(ASG) • Consider the query EMP 1 EMP 2 EMP 3 SELECT * FROM EMP, ASG WHERE EMP. ENO=ASG. ENO • • Distribute join over unions Apply the reduction rule ⋈ENO EMP 1 Distributed DBMS © M. T. Özsu & P. Valduriez ASG 1 ⋈ENO ASG 1 EMP 2 ASG 2 ⋈ENO ASG 2 EMP 3 ASG 2 Ch. 7/21
Provides Parallellism ⋈ENO EMP 1 Distributed DBMS ASG 1 EMP 2 ⋈ENO ASG 2 EMP 3 © M. T. Özsu & P. Valduriez ASG 1 EMP 3 ⋈ENO ASG 2 Ch. 7/22
Eliminates Unnecessary Work ⋈ENO EMP 1 Distributed DBMS ⋈ENO ASG 1 EMP 2 ASG 2 EMP 3 © M. T. Özsu & P. Valduriez ⋈ENO ASG 2 Ch. 7/23
Reduction for VF • Find useless (not empty) intermediate relations Relation R defined over attributes A = {A 1, . . . , An} vertically fragmented as Ri = A'(R) where A' A: D, K(Ri) is useless if the set of projection attributes D is not in A' Example: EMP 1= ENO, ENAME (EMP); EMP 2= ENO, TITLE (EMP) SELECT ENAME FROM EMP ENAME ⋈ENO EMP 1 Distributed DBMS EMP 2 © M. T. Özsu & P. Valduriez EMP 1 Ch. 7/24
Reduction for DHF • Rule : ➡ Distribute joins over unions ➡ Apply the join reduction for horizontal fragmentation (using the qualification of the primary fragments!) • Example ASG 1: ASG ⋉ENO EMP 1 ASG 2: ASG ⋉ENO EMP 2 EMP 1: TITLE=“Programmer” (EMP) • EMP 2: TITLE≠“Programmer” (EMP) Query SELECT FROM WHERE AND Distributed DBMS * EMP, ASG. ENO = EMP. ENO EMP. TITLE = "Mech. Eng. " © M. T. Özsu & P. Valduriez Ch. 7/25
Reduction for DHF ⋈ENO Generic query TITLE=“Mech. Eng. ” ASG 1 ASG 2 Selections first EMP 1 ⋈ENO TITLE=“Mech. Eng. ” ASG 1 Distributed DBMS EMP 2 ASG 2 © M. T. Özsu & P. Valduriez EMP 2 Ch. 7/26
Reduction for DHF Joins over unions ⋈ENO TITLE=“Mech. Eng. ” ASG 1 EMP 2 TITLE=“Mech. Eng. ” ASG 2 EMP 2 Elimination of the empty intermediate relations ⋈ENO (left sub-tree) TITLE=“Mech. Eng. ” ASG 2 Distributed DBMS EMP 2 © M. T. Özsu & P. Valduriez Ch. 7/27
Reduction for Hybrid Fragmentation • Combine the rules already specified: ➡ Remove empty relations generated by contradicting selections on horizontal fragments; ➡ Remove useless relations generated by projections on vertical fragments; ➡ Distribute joins over unions in order to isolate and remove useless joins. Distributed DBMS © M. T. Özsu & P. Valduriez Ch. 7/28
Reduction for HF Example ENAME Consider the following hybrid fragmentation: ENAME EMP 1= ENO≤"E 4" ( ENO, ENAME (EMP)) ENO=“E 5” EMP 2= ENO>"E 4" ( ENO, ENAME (EMP)) ⋈ENO EMP 3= ENO, TITLE (EMP) and the query SELECT ENAME FROM EMP WHERE ENO="E 5" Distributed DBMS ENO=“E 5” EMP 2 EMP 1 EMP 2 EMP 3 © M. T. Özsu & P. Valduriez Ch. 7/29
- Slides: 29