On Random Sampling over Joins Surajit Chaudhuri Microsoft
- Slides: 19
On Random Sampling over Joins Surajit Chaudhuri Microsoft Research Rajeeve Motwani Vivek Narasayya Stanford University Microsoft Research Compiled by: Arjun Dasgupta
CONTENTS • • • The difficulty of join sampling Semantic and algorithms of sample Two previous sampling strategies New strategies for join sampling Experiment’s results
SAMPLE (R 1><R 2, f) ≠ SAMPLE (R 1, f) >< SAMPLE (R 2, f)
STRATEGY USED • Obtain SAMPLE (R 1><R 2, f) from nonuniform samples of R 1 and R 2
The Difficulty of Join Sampling Example: • Suppose that we have the relations
TECHNIQUES FOR SAMPLING • • Black Box U 1 (un-weighted) Black Box U 2 (un-weighted) Black Box WR 1 (weighted) Black Box WR 2 (weighted)
Black-Box U 2: Given relation R with n tuples, generate an unweighted WR sample of size r. . 1. 2 Initialize reservoir array A[1. . r] with r dummy values. 3. While tuples are streaming by do begin (a) get next tuple t; (b) (c) for j=1 to r set A[j] to t with probability 1/N end
Black-Box WR 2: Given relation R with n tuples, generate a weighted WR sample of size r. • . 1 • . 2 Initialize reservoir array A[1…r] with r dummy values. • 3. While tuples are streaming by do begin (a) get next tuple t with weight w(t); (b) (c) for j=1 to r do set A[j] to t with prob. w(t)/W end.
The Classification of the Problem: • Case A : No information is available for either or. • Case B : No information is available for indexes and /or statistics are available for • Case C : Indexes/statistics are available for and . but.
Previous Sampling Strategies Strategy Naive-Sample: 1. Compute the join. 2. As the tuples of J stream by, use Black-Box U 1 or U 2 to produce.
Previous Sampling Strategies Strategy Olken-Sample: 1. Let M be an upper bound on for all. 2. repeat (a) Sample a tuple uniformly at random. (b) Sample a random tuple from among all tuples that have. (c) Output with probability , and with remaining probability reject the sample. Until r tuples have been produced.
New Strategies for Join Sampling Strategy Stream Sample: 1. Use Black-Box WR 1 or WR 2 to produce a WR sample of size r, where the weight for a tuple is set to 2. While tuples of are streaming by do begin (a) get next tuple and let ; (b) sample a random tuple from among all tuples that have ; (c) output. end.
New Strategies for Join Sampling • Strategy Stream Sample is more efficiency then Olken : 1. No information is required for case B. 2. No tuple is rejected after computing the join. 3. Only one iteration is needed for each output tuple.
New Strategies for Join Sampling Strategy Group Sample 1. Use Black-Box WR 1 or WR 2 to produce a WR sample of size r, where the weight for a tuple is set to. 2. Let consist of the tuples. Produce whose tuples are grouped by ‘s tuples that generated them. 3. Use r invocations of Black-Box U 1 or U 2 to sample r sample, one of each group.
New Strategy for Join Sampling • Strategy Frequency-Partition-Sample
Experimental Results:
Experimental Results:
Experimental Results:
Summery • The difficulty of join sampling- example. • The classification of the problem - 3 cases. • Naive-sample Olken -sample previous strategies • Stream-sample Groupsample new strategies Frequency-partition-sample • Conclusion : The new strategies are better then the earlier techniques.
- Surajit chaudhuri
- Surajit chaudhuri
- Random sampling over joins revisited
- Random assignment vs random sampling
- Objective of sampling
- Stratified random sample vs cluster sample
- Stratified sample vs cluster sample
- Convenience sampling images
- Random assignment vs random selection
- Swarat chaudhuri
- Salient features of operation blackboard
- Manas kumar chaudhuri
- Mainak chaudhuri
- Mainak chaudhuri iitk
- Manas kumar chaudhuri
- Sumanta chaudhuri
- Swarat chaudhuri
- Swarat chaudhuri
- Consecutive sampling
- Sample frame