CS 347 Distributed Databases and Transaction Processing Notes
CS 347: Distributed Databases and Transaction Processing Notes 03: Query Processing Hector Garcia-Molina CS 347 Notes 03 1
Query Processing • Decomposition • Localization • Optimization CS 347 Notes 03 2
Decomposition • Same as in centralized system • Normalization • Eliminating redundancy • Algebraic rewriting CS 347 Notes 03 3
Normalization • Convert from general language to a “standard” form (e. g. , Relational Algebra) CS 347 Notes 03 4
Example Select A, C From R, S Where (R. B=1 and S. D=2) or (R. C>3 and S. D. =2) A, C (R. B=1 v R. C>3) S. D. =2 R CS 347 Conjunctive normal form S Notes 03 5
Also: Detect invalid expressions E. g. : Select * from R where R. A =3 R does not have “A” attribute CS 347 Notes 03 6
Eliminate redundancy E. g. : in conditions: (S. A=1) (S. A>5) False (S. A<10) (S. A<5) S. A<5 CS 347 Notes 03 7
E. g. : Common sub-expressions U S R CS 347 cond U cond T S R cond T R Notes 03 8
Algebraic rewriting E. g. : Push conditions down cond R CS 347 cond 3 S R Notes 03 cond 1 cond 2 S 9
• After decomposition: – One or more algebraic query trees on relations • Localization: – Replace relations by corresponding fragments CS 347 Notes 03 10
Localization steps (1) Start with query (2) Replace relations by fragments (3) Push : up (use CS 245 rules) , : down (4) Simplify – eliminate unnecessary operations CS 347 Notes 03 11
Notation for fragment [R: cond] fragment CS 347 conditions its tuples satisfy Notes 03 12
Example A (1) E=3 R CS 347 Notes 03 13
(2) E=3 [R 1: E < 10] CS 347 [R 2: E 10] Notes 03 14
(3) E=3 [R 1: E < 10] CS 347 E=3 [R 2: E 10] Notes 03 15
(3) E=3 [R 1: E < 10] E=3 [R 2: E 10] CS 347 Notes 03 Ø 16
(4) E=3 [R 1: E < 10] CS 347 Notes 03 17
Rule 1 A B CS 347 C 1[R: c 2] C 1[R: c 1 c 2] [R: False] Ø Notes 03 18
In example A: E=3[R 2: E 10] E=3 [R 2: E=3 E 10] E=3 [ R 2: False] Ø CS 347 Notes 03 19
Example B (1) A R CS 347 A=common attribute S Notes 03 20
(2) A [R 1: A<5] [R 2: 5 A 10] [R 3: A>10] [S 1: A<5] [S 2: A 5] CS 347 Notes 03 21
(3) A A A [R 1: A<5][S 1: A<5] [R 1: A<5][S 2: A 5] [R 2: 5 A 10][S 1: A<5] A A A [R 2: 5 A 10][S 2: A 5] [R 3: A>10][S 1: A<5] [R 3: A>10][S 2: A 5] CS 347 Notes 03 22
(3) A A A [R 1: A<5][S 1: A<5] [R 1: A<5][S 2: A 5] [R 2: 5 A 10][S 1: A<5] A A A [R 2: 5 A 10][S 2: A 5] [R 3: A>10][S 1: A<5] [R 3: A>10][S 2: A 5] CS 347 Notes 03 23
(4) A A A [R 1: A<5][S 1: A<5] [R 2: 5 A 10][S 2: A 5] [R 3: A>10][S 2: A 5] CS 347 Notes 03 24
Rule 2 [R: C 1] [R CS 347 A A [S: C 2] S: C 1 C 2 R. A = S. A] Notes 03 25
In step 4 of Example B: [R 1: A<5] [R 1 CS 347 A A A [S 2: A 5] S 2: R 1. A < 5 S 2. A 5 R 1. A = S 2. A ] S 2: False] Ø Notes 03 26
Localization with derived fragmentation Example C (2) K R 1: R 2: A<10 A 10 CS 347 S 1: K=R. K S 2: K=R. K R. A<10 R. A 10 Notes 03 27
(3) K [R 1][S 1] CS 347 K [R 1][S 2] K [R 2][S 1] Notes 03 K [R 2][S 2] 28
(4) K K [R 1: A<10] S 1: K=R. K R. A<10 CS 347 [R 2: A 10] S 2: K=R. K R. A 10 Notes 03 29
In step 4 of Example C: [R 1: A<10] [R 1 CS 347 K K K [S 2: K=R. K R. A 10] S 2: R 1. A<10 S 2. K=R. K R. A 10 R 1. K= S 2. K] S 2: False ] (K is key of R, R 1) Ø Notes 03 30
(4) K K [R 1: A<10] S 1: K=R. K R. A<10 [R 2: A 10] S 2: K=R. K R. A 10 (4) simplified more: K K R 1 CS 347 R 2 S 1 Notes 03 S 2 31
• Localization with vertical fragmentation Example D (1) A R 1(K, A, B) R CS 347 R 2(K, C, D) Notes 03 32
(2) A K R 1 (K, A, B) CS 347 R 2 (K, C, D) Notes 03 33
A (3) K K, A R 1 (K, A, B) CS 347 K, A not really needed R 2 (K, C, D) Notes 03 34
(4) A R 1 (K, A, B) CS 347 Notes 03 35
Rule 3 • Given vertical fragmentation of R: Ri = Ai (R), Ai A • Then for any B A: B (R) = B [ CS 347 i R i | B Ai Ø ] Notes 03 36
• Localization with hybrid fragmentation Example E k<5 [ k, A R] R 2 = k 5 [ k, A R] R 1 = R 3 = k, B R CS 347 Notes 03 37
Query: A k=3 R CS 347 Notes 03 38
Reduced Query: A k=3 R 1 CS 347 Notes 03 39
Summary - Query Processing • Decomposition • Localization • Optimization – Overview – Tricks for joins + other operations – Strategies for optimization CS 347 Notes 03 40
Optimization Process: Generate query P 1 plans Estimate size of intermediate results Estimate cost of C 1 plan ($, time, …) P 2 P 3 Pn C 2 C 3 Cn pick minimum CS 347 Notes 03 41
Differences with centralized optimization: • New strategies for some operations (semi-join, range-partitioning, sort, …) • Many ways to assign and schedule processors CS 347 Notes 03 42
Parallel/distributed sort Input: CS 347 (a) relation R on single site/disk (b) R fragmented/partitioned by sort attribute (c) R fragmented/partitioned by other attribute Notes 03 43
Output (a) sorted R on single site/disk (b) fragments/partitions sorted 5. . . 6. . . 10 F 1 CS 347 12. . . 15 19. . . 20 21 50 F 2 F 3 Notes 03 44
Basic sort • R(K, …), sort on K • Fragmented on K Vector: ko, k 1, … kn 7 3 CS 347 ko 10 11 17 14 Notes 03 k 1 20 27 22 45
• Algorithm: each fragment sorted independently • If necessary, ship results CS 347 Notes 03 46
Same idea on different architectures: Shared nothing: P 1 M M sorts F 1 sorts F 2 F 1 Shared memory: Net P 1 F 1 CS 347 P 2 Notes 03 F 2 P 2 M F 2 47
Range partitioning sort • R(K, …. ), sort on K • R located at one or more site/disk, not fragmented on K CS 347 Notes 03 48
• Algorithm: (a) Range partition on K (b) Basic sort Ra Rb R 1 Local sort R’ 1 ko R 2 Local sort R’ 3 Result k 1 R 3 CS 347 Notes 03 49
• Selecting a good partition vector CS 347 7. . . 52 11 14 31. . . 8 15 11 32 17 Ra Rb Notes 03 10. . . 12 4 Rc 50
Example • Each site sends to coordinator: – Min sort key – Max sort key – Number of tuples • Coordinator computes vector and distributes to sites (also decides # of sites for local sorts) CS 347 Notes 03 51
• Sample scenario: Coordinator receives: SA: Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples CS 347 Notes 03 52
• Sample scenario: Coordinator receives: SA: Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples Expected tuples: 5 10 ko? CS 347 2 1 15 20 [assuming we want to sort at 2 sites] Notes 03 53
Expected tuples: 5 10 ko? CS 347 2 1 15 20 [assuming we want to sort at 2 sites] Notes 03 54
Expected tuples: 5 2 1 10 ko? Expected tuples with key < ko 15 20 [assuming we want to sort at 2 sites] = Total tuples 2 2(ko - 5) + (ko - 7) = 10 3 ko = 10 + 7 = 27 ko = 9 CS 347 Notes 03 55
Variations • Send more info to coordinator – Partition vector for local site Eg. Sa: 3 3 3 # tuples 5 6 8 10 local vector - Histogram 5 CS 347 6 7 Notes 03 8 9 10 56
More than one round E. g. : (1) Sites send range and # tuples (2) Coordinator returns “preliminary” vector Vo (3) Sites tell coordinator how many tuples in each Vo range (4) Coordinator computes final vector Vf CS 347 Notes 03 57
Can you come up with a distributed algorithm? (no coordinator) CS 347 Notes 03 58
Parallel external sort-merge • Same as range-partition sort, except sort first Ra Rb Local sort Ra’ Rb’ In order R 1 ko R 2 Result k 1 R 3 Merge CS 347 Notes 03 59
Parallel external sort-merge • Same as range-partition sort, except sort first Ra Rb Local sort Note: can use merging network if available (e. g. , Teradata) Ra’ Rb’ In order R 1 ko R 2 Result k 1 R 3 Merge CS 347 Notes 03 60
• Parallel/distributed Join Input: Output: CS 347 Relations R, S May or may not be partitioned R S Result at one or more sites Notes 03 61
Partitioned Join (Equi-join) Local join Ra R 1 Sa Rb R 2 Sb R 3 Sc f(A) CS 347 Result Notes 03 f(A) 62
Notes: • Same partition function f is used for both R and S (applied to join attribute) • f can be range or hash partitioning • Local join can be of any type (use any CS 245 optimization) • Various scheduling options e. g. , (a) partition R; partition S; join (b) partition R; build local hash table for R; partition S and join CS 347 Notes 03 63
More notes: • We already know why part-join works: R 1 R 2 R 3 S 1 S 2 S 3 R 1 S 1 R 2 S 2 R 3 S 3 • Useful to give this type of join a name, because we may want to partition data to make partition-join possible (especially in parallel DB system) CS 347 Notes 03 64
Even more notes: • Selecting good partition function f very important: – Number of fragments – Hash function – Partition vector CS 347 Notes 03 65
• Good partition vector – Goal: | Ri |+| Si | the same – Can use coordinator to select CS 347 Notes 03 66
Asymmetric fragment + replicate join Local join Ra R 1 S Sa Rb R 2 S Sb R 3 S f partition CS 347 Result Notes 03 union 67
Notes: • Can use any partition function f for R (even round robin) • Can do any join — not just equi-join e. g. : R S R. A < S. B CS 347 Notes 03 68
General fragment and replicate join Ra R 1 Rb R 2 R 3 f 1 partition -> 3 fragments CS 347 n copies of each fragment Notes 03 69
R 1 S 1 R 1 S 2 R 2 S 1 R 2 S 2 R 3 S 1 R 3 S 2 All nxm pairings of R, S fragments S is partitioned in similar fashion Result CS 347 Notes 03 70
Notes: • Asymmetric F+R join is special case of general F+R • Asymmetric F+R may be good if S small • Works for non-equi-joins CS 347 Notes 03 71
• Semi-join • Goal: reduce communication traffic • R A S (R A S) A S or R (R CS 347 A A (S S) Notes 03 R) or A A (S A R) 72
Example: R R CS 347 A B 2 10 25 30 a b c d S S Notes 03 A C 3 10 15 25 32 x y z w x 73
Example: R R A B 2 10 25 30 a b c d S S A R = [2, 10, 25, 30] CS 347 Notes 03 A C 3 10 15 25 32 x y z w x 74
Example: R R Ans: R S A B 2 10 25 30 a b c d S S A R = [2, 10, 25, 30] S CS 347 A C 3 10 15 25 32 x y z w x A C 10 y R= 25 w Notes 03 75
R A B 2 10 25 30 a b c d A S 3 10 15 25 32 Computing transmitted data in • with semi-join R (S R): T = 4 |A| +2 |A+C| + result • with join R S: T = 4 |A+B| + result CS 347 Notes 03 C x y z w x example: 76
R A B 2 10 25 30 a b c d A S 3 10 15 25 32 Computing transmitted data in • with semi-join R (S R): T = 4 |A| +2 |A+C| + result • with join R S: T = 4 |A+B| + result CS 347 Notes 03 C x y z w x example: better if say |B| is large 77
In general: • Say R is smaller relation • (R A S) A S better than R size ( A S) + size (R CS 347 Notes 03 A A S if S) < size (R) 78
• Similar comparisons for other semi-joins • Remember: only taking into account transmission cost CS 347 Notes 03 79
• Trick: Encode A S (or A R ) as a bit vector key in S 001101000010100 <----one bit/possible key-------> CS 347 Notes 03 80
Three way joins with semi-joins Goal: R CS 347 S T Notes 03 81
Three way joins with semi-joins Goal: R S T Option 1: R’ S’ T where R’ = R S; S’ = S CS 347 Notes 03 T 82
Three way joins with semi-joins Goal: R S T Option 1: R’ S’ T where R’ = R S; S’ = S Option 2: R’’ S’ where R’’ = R CS 347 T S’; S’ = S Notes 03 T T 83
Many options! Number of semi-join options is exponential in # of relations in join CS 347 Notes 03 84
Privacy Preserving Join • • Site 1 has R(A, B) R Site 2 has S(A, C) site 1 Want to compute R S Site 1 should NOT discover any S info not in the join • Site 2 should NOT discover any R info not in the join CS 347 Notes 03 S site 2 85
Semi-Join Does Not Work • If Site 1 sends A R to Site 2, site 2 leans all keys of R! R A a 1 a 2 a 3 a 4 B b 1 b 2 b 3 b 4 A R = (a 1, a 2, a 3, a 4) site 1 CS 347 S A a 1 a 3 a 5 a 7 C c 1 c 2 c 3 c 4 site 2 Notes 03 86
Fix: Send hashed keys • Site 1 hashes each value of A before sending • Site 2 hashes (same function) its own A values to see what tuples match R A a 1 a 2 a 3 a 4 B b 1 b 2 b 3 b 4 site 1 CS 347 A R = (h(a 1), h(a 2), h(a 3), h(a 4)) Site 2 sees it has h(a 1), h(a 3) (a 1, c 1), (a 3, c 3) Notes 03 S A a 1 a 3 a 5 a 7 C c 1 c 2 c 3 c 4 site 2 87
What is problem? R A a 1 a 2 a 3 a 4 B b 1 b 2 b 3 b 4 site 1 CS 347 A R = (h(a 1), h(a 2), h(a 3), h(a 4)) Site 2 sees it has h(a 1), h(a 3) (a 1, c 1), (a 3, c 3) Notes 03 S A a 1 a 3 a 5 a 7 C c 1 c 2 c 3 c 4 site 2 88
What is problem? R A a 1 a 2 a 3 a 4 B b 1 b 2 b 3 b 4 site 1 A R = (h(a 1), h(a 2), h(a 3), h(a 4)) Site 2 sees it has h(a 1), h(a 3) (a 1, c 1), (a 3, c 3) S A a 1 a 3 a 5 a 7 C c 1 c 2 c 3 c 4 site 2 • Dictionary attack! Site 2 takes all keys, a 1, a 2, a 3. . . and checks if h(a 1), h(a 2), h(a 3) matches what Site 1 sent. . . CS 347 Notes 03 89
Adversary Model • Honest but Curious – dictionary attack is possible (cheating is internal and can’t be caught) – sending incorrect keys not possible (cheater could be caught) CS 347 Notes 03 90
One Solution (Agrawal et al) • Use commutative encryption function – Ei(x) = x encryption using site i private key – E 1( E 2 (x)) = E 2( E 1 (X)) – Shorthand for example: E 1(x) is x E 2(x) is x E 1(E 2(x)) is x CS 347 Notes 03 91
Solution: R A a 1 a 2 a 3 a 4 B b 1 b 2 b 3 b 4 (a 1, a 2, a 3, a 4) (a 1, a 3, a 5, a 7) site 1 S A a 1 a 3 a 5 a 7 C c 1 c 2 c 3 c 4 site 2 computes (a 1, a 3, a 5, a 7), intersects with (a 1, a 2, a 3, a 4) (a 1, b 1), (a 3, b 3) CS 347 Notes 03 92
Why does this solution work? CS 347 Notes 03 93
Other Privacy Preserving Operations? • Inequality join R S R. A > S. A • Similarity Join CS 347 R S sim(R. A, S. A)<e Notes 03 94
Other parallel operations • Duplicate elimination – Sort first (in parallel) then eliminate duplicates in result – Partition tuples (range or hash) and eliminate locally • Aggregates – Partition by grouping attributes; compute aggregate locally CS 347 Notes 03 95
Example: Ra Rb • sum (sal) group by dept CS 347 Notes 03 96
Example: Ra Rb • sum (sal) group by dept CS 347 Notes 03 97
Example: sum Ra Rb sum • sum (sal) group by dept CS 347 Notes 03 98
Example: less data! Ra Rb • sum (sal) group by dept CS 347 Notes 03 99
Example: less data! sum Ra Rb sum • sum (sal) group by dept CS 347 Notes 03 100
Example: less data! sum Ra Rb sum • sum (sal) group by dept CS 347 Notes 03 101
Preview: Map Reduce data A 1 data B 1 data C 1 data B 2 data C 2 data A 3 CS 347 Notes 03 102
Enhancements for aggregates • Perform aggregate during partition to reduce data transmitted • Does not work for all aggregate functions… Which ones? CS 347 Notes 03 103
Selection • Range or hash partition • Straightforward But what about indexes? CS 347 Notes 03 104
Indexing • Can think of partition vector as root of distributed index: k 1 Local indexes ko Site 1 CS 347 Site 2 Notes 03 Site 3 105
• Index on non-partition attribute ko k 1 Index sites Tuple sites CS 347 Notes 03 106
Notes: • If index is not too big, it may be better to keep whole and make copies. . . • If updates are frequent, can partition update work. . . (Question: how do we handle split of B-Tree pages? ) CS 347 Notes 03 107
• Extensible or linear hashing R 1 f R 2 R 3 R 4 <- add CS 347 Notes 03 108
• How do we adapt schemes? • Where do we store directory, set of participants. . . ? • Which one is better for a distributed environment? • Can we design a hashing scheme with no global knowledge (P 2 P)? CS 347 Notes 03 109
Summary: Query processing • Decomposition and Localization • Optimization – Overview – Tricks for joins, sort, . . – Tricks for inter-operations parallelism – Strategies for optimization CS 347 Notes 03 110
Inter-operation parallelism • Pipelined • Independent CS 347 Notes 03 111
Pipelined parallelism Site 2 c Site 1 R S Join Site 1 R CS 347 Tuples matching result Probe S c Notes 03 112
Independent parallelism Site 2 Site 1 R S T V (1) temp 1 R S; temp 2 T (2) result temp 1 temp 2 CS 347 Notes 03 V 113
• Pipelining cannot be used in all cases e. g. : Hash Join CS 347 Stream of R tuples S tuples Notes 03 114
Summary As we consider query plans for optimization, we must consider various tricks: - for individual operations - for scheduling operations CS 347 Notes 03 115
- Slides: 115