Lecture 4 Divide and Conquer for Nearest Neighbor

Lecture 4 Divide and Conquer for Nearest Neighbor Problem Shang-Hua Teng

Merge-Sort(A, p, r) A procedure sorts the elements in the sub-array A[p. . r] using divide and conquer • Merge-Sort(A, p, r) – if p >= r, do nothing – if p< r then • Merge-Sort(A, p, q) • Merge-Sort(A, q+1, r) • Merge(A, p, q, r) • Starting by calling Merge-Sort(A, 1, n)

A = Merge. Array(L, R) Assume L[1: s] and R[1: t] are two sorted arrays of elements: Merge-Array(L, R) forms a single sorted array A[1: s+t] of all elements in L and R. • A = Merge. Array(L, R) – – – for k 1 to s + t • do if – then – else

Complexity of Merge. Array • At each iteration, we perform 1 comparison, 1 assignment (copy one element to A) and 2 increments (to k and i or j ) • So number of operations per iteration is 4. • Thus, Merge-Array takes at most 4(s+t) time. • Linear in the size of the input.

Merge (A, p, q, r) Assume A[p. . q] and A[q+1. . r] are two sorted Merge(A, p, q, r) forms a single sorted array A[p. . r]. • Merge (A, p, q, r) – –

Merge-Sort(A, p, r) A procedure sorts the elements in the sub-array A[p. . r] using divide and conquer • Merge-Sort(A, p, r) – if p >= r, do nothing – if p< r then • Merge-Sort(A, p, q) • Merge-Sort(A, q+1, r) • Merge(A, p, q, r)

Divide and Conquer • Divide the problem into a number of sub -problems (similar to the original problem but smaller); • Conquer the sub-problems by solving them recursively (if a sub-problem is small enough, just solve it in a straightforward manner. • Combine the solutions to the sub-problems into the solution for the original problem

Merge Sort • Divide the n-element sequence to be sorted into two subsequences of n/2 element each • Conquer: Sort the two subsequences recursively using merge sort • Combine: merge the two sorted subsequences to produce the sorted answer • Note: during the recursion, if the subsequence has only one element, then do nothing.

Algorithm Design Paradigm I • Solve smaller problems, and use solutions to the smaller problems to solve larger ones – Divide and Conquer • Correctness: mathematical induction

Running Time of Merge-Sort • Running time as a function of the input size, that is the number of elements in the array A. • The Divide-and-Conquer scheme yields a clean recurrences. • Assume T(n) be the running time of mergesort for sorting an array of n elements. • For simplicity assume n is a power of 2, that is, there exists k such that n = 2 k.

Recurrence of T(n) • T(1) = 1 • for n > 1, we have if n = 1 if n > 1

Solution of Recurrence of T(n) = 4 nlog n + n = O(nlog n) • Picture Proof by Recursion Tree

Two Dimensional Divide and Conquer Can we extend the divide and conquer idea to 2 dimensions? We will consider a slightly simpler problem (handout #33, Chapter 33. 4)

Closest Pair Problems • Input: – A set of points P = {p 1, …, pn} in two dimensions • Output: – The pair of points pi, pj that minimize the Euclidean distance between them.

Closest Pair Problem

Divide and Conquer • O(n 2) time algorithm is easy • Assumptions: – No two points have the same x-coordinates – No two points have the same y-coordinates • How do we solve this problem in 1 dimensions? – Sort the number and walk from left to right to find minimum gap

Divide and Conquer • Divide and conquer has a chance to do better than O(n 2). • Assume that we can find the median in O(n) time!!! • We can first sort the point by their xcoordinates

Closest Pair Problem

Divide and Conquer for the Closest Pair Problem Divide by x-median

Divide L R Divide by x-median

Conquer L R Conquer: Recursively solve L and R

Combination I L R d 2 Takes the smaller one of d 1 , d 2 : d = min(d 1 , d 2 )

Combination II Is there a point in L and a point in R whose distance is smaller than d ? L R Takes the smaller one of d 1 , d 2 : d = min(d 1 , d 2 )

Combination II • If the answer is “no” then we are done!!! • If the answer is “yes” then the closest such pair forms the closest pair for the entire set • Why? ? • How do we determine this?

Combination II Is there a point in L and a point in R whose distance is smaller than d ? L R Takes the smaller one of d 1 , d 2 : d = min(d 1 , d 2 )

Combination II Is there a point in L and a point in R whose distance is smaller than d ? L R Need only to consider the narrow band O(n) time

Combination II Is there a point in L and a point in R whose distance is smaller than d ? L R Denote this set by S, assume Sy is sorted list of S by y-coordinate.

Combination II • There exists a point in L and a point in R whose distance is less than d if and only if there exist two points in S whose distance is less than d. • If S is the whole thing, did we gain any thing? • If s and t in S has the property that ||s-t|| < d, then s and t are within 30 position of each other in the sorted list Sy.

Combination II Is there a point in L and a point in R whose distance is smaller than d ? L R There at most one point in each box

Closest-Pair • Closest-pair(P) – Preprocessing: • Construct Px and Py as sorted-list by x- and y-coordinates – Divide • Construct L, Lx , Ly and R, Rx , Ry – Conquer • Let d 1= Closest-Pair(L, Lx , Ly ) • Let d 2= Closest-Pair(R, Rx , Ry ) – Combination • • Let d = min(d 1 , d 2 ) Construct S and Sy For each point in Sy, check each of its next 30 points down the list If the distance is less than d , update the d as this smaller distance

Complexity Analysis • • Preprocessing takes O(n lg n) time Divide takes O(n) time Conquer takes 2 T(n/2) time Combination takes O(n) time • So totally takes O(n lg n) time