Solution to the allpairs function Assume for simplicity

Expected (in terms of efficiency) solution Sliding Window Approach: (~ merge sort join) k

Here is a better illustration of the Sliding Window Approach: k 1 k 2

Notes • Typical setting: All students used the naïve approach doing unnecessary comparisons. •

Slides: 4

Solution to the “all-pairs” function Assume for simplicity that the keys in the leaf nodes reside in an array. Naïve Approach: (~ nested loops join) To find all the key pairs within distance ε, we start comparing the 1 st key with the consecutive keys (2 nd, 3 rd, …) until we reach an ε distance. Then we move to the 2 nd key and compare it with the 3 rd, 4 th, … again until we reach an ε distance and we do the same for all the keys until we reach the last key in the array. This approach is naïve and so is not as efficient, because we might end up doing unnecessary comparisons. For example, if the 5 th key is the last key that is within ε distance to the 1 st key, when we move to the 2 nd key we do not need to compare it with the 3 rd, 4 th and the 5 th keys because we are assured that they are less than ε distance to the 2 nd key (since they are within ε distance to the 1 st key and 2 nd key in a B+tree is always greater than the 1 st key).

Expected (in terms of efficiency) solution Sliding Window Approach: (~ merge sort join) k 1 k 2 k 3 k 4 k 5 k 6 k 7 k 8 k 9 …. ≤ε >ε Compare k 1 with k 2, k 3, k 4, k 5, k 6. Assume strdist(k 1, k 6) > k 1 k 2 k 3 k 4 k 5 k 6 k 7 k 8 k 9 ≤ε ε. Then print pairs (k 1, k 2), (k 1, k 3), (k 1, k 4), and (k 1, k 5). Move to k 2. …. ? Print pairs (k 2, k 3), (k 2, k 4), (k 2, k 5) directly (unnecessary comparisons of k 2 to k 3, k 4 and k 5 were deducted points). Continue comparing k 2 to k 6, k 7, … until an ε distance is reached. Do the same for k 3 and for all the remaining keys. That is, print pairs (k 3, k 4), (k 3, k 5) …. (k 3, kx) without doing comparisons if kx is the last key within ε distance to k 2, etc.

Here is a better illustration of the Sliding Window Approach: k 1 k 2 k 3 k 4 k 5 k 6 k 7 k 8 …. ε Print pairs (k 1, k 2), (k 1, k 3), (k 1, k 4), and (k 1, k 5). k 1 k 2 k 3 k 4 k 5 k 6 k 7 k 8 …. ε Move to k 2 (slide the window). Print pairs (k 2, k 3), (k 2, k 4), and (k 2, k 5) without comparisons. Still have to compare k 2 to k 6 and k 7 (and k 8 to see Where to stop). Move to k 3 and so on.

Notes • Typical setting: All students used the naïve approach doing unnecessary comparisons. • The sliding window approach can be implemented in recursive solutions by bookkeeping an “end-pointer” as well as a startpointer. • Keys-in-range were implemented efficiently by most students by searching for the first key and then scanning until the second key is reached.