Selection Medians and Order Statistics Chap 9 The

  • Slides: 11
Download presentation
Selection --Medians and Order Statistics (Chap. 9) • The ith order statistic of n

Selection --Medians and Order Statistics (Chap. 9) • The ith order statistic of n elements S={a 1, a 2, …, an} : ith smallest elements • Also called selection problem • Minimum and maximum • Median, lower median, upper median • Selection in expected/average linear time • Selection in worst-case linear time 1

O(nlg n) Algorithm • Suppose n elements are sorted by an O(nlg n) algorithm,

O(nlg n) Algorithm • Suppose n elements are sorted by an O(nlg n) algorithm, e. g. , MERGE-SORT – – Minimum: the first element Maximum: the last element The ith order statistic: the ith element. Median: • If n is odd, then ((n+1)/2)th element. • If n is even, – then ( (n+1)/2 )th element, lower median – then ( (n+1)/2 )th element, upper median • All selections can be done in O(1), so total: O(nlg n). • Can we do better? 2

Selection in Expected Linear Time O(n) • Select ith element • A divide-and-conquer algorithm

Selection in Expected Linear Time O(n) • Select ith element • A divide-and-conquer algorithm RANDOMIZEDSELECT • Similar to quicksort, partition the input array recursively • Unlike quicksort, which works on both sides of the partition, just work on one side of the partition. – Called prune-and-search, prune one side, just search the other side). • (Please review or read quicksort in chapter 7. ) 3

RANDOMIZED-SELECT(A, p, r, i) 1. 2. 3. 4. 5. 6. 7. 8. if p=r

RANDOMIZED-SELECT(A, p, r, i) 1. 2. 3. 4. 5. 6. 7. 8. if p=r then return A[p] q RANDOMIZED-PARTITION(A, p, r) //the q holds for A[p, q-1] A[q+1, r] k q-p+1 if i=k then return A[q] else if i<k then return RANDOMIZED-SELECT(A, p, q-1, i) else return RANDOMIZED-SELECT(A, q+1, r, i-k) 4

Analysis of RANDOMIZED-SELECT • Worst-case running time (n 2), why? ? ? it may

Analysis of RANDOMIZED-SELECT • Worst-case running time (n 2), why? ? ? it may be unlucky and always partition into A[q], an empty side and a side with remaining elements. So every partitioning of m elements will take (m) time, and m=n, n-1, …, 2. Thus total is (n)+ (n-1)+…+ (2)= (n(n+1)/2 -1) = (n 2). Moreover, no particular input elicits the worst-case behavior, Because of “randomness”. But in average, it is good. By using probabilistic analysis/random variable, it can be proven that the expected running time is O(n). (ref. to page 187). Can we do better, such that O(n) in worst case? ? 5

Selection in worst case linear time O(n) • Select the ith smallest element of

Selection in worst case linear time O(n) • Select the ith smallest element of S={a 1, a 2, …, an} • Use so called prune-and-search technique: – Let x S, and partition S into three subsets – S 1={aj | aj <x}, S 2={aj | aj =x}, S 3={aj | aj >x} – If | S 1 |>i, search ith smallest element in S 1 recursively, (prune S 2 and S 3 away) – Else If | S 1 |+| S 2 |>i, then return x (the ith smallest element) – Else search (i-(| S 1 |+| S 2 |))th in S 3 recursively, (prune S 1 and S 2 away) • The question is how to select x such that S 1 and S 3 are nearly equal. ? 6

The Way to Select x At least (3 n/10)-6 elements <x Divide elements into

The Way to Select x At least (3 n/10)-6 elements <x Divide elements into n/5 groups of 5 elements each. Find the median of each group Find the median of the medians At least (3 n/10)-6 elements >x Because each of 1/2 n/5 -2 groups contributes 3 elements which are x 7

SELECT ith Element in n Elements) 1. Divide n elements into n/5 groups of

SELECT ith Element in n Elements) 1. Divide n elements into n/5 groups of 5 elements. 2. Find the median of each group. 3. Use SELECT recursively to find the median x of the above n/5 medians. 4. Partition n elements around x into S 1, S 2 , and S 3. 5. If |S 1|>i, search ith smallest element in S 1 recursively, Else If |S 1|+|S 2|>i, then return x (the ith smallest element) Else search (i-(|S 1|+|S 2|))th in S 3 recursively, 8

Analysis of SELECT (cont. ) • Steps 1, 2, 4 take O(n), • Step

Analysis of SELECT (cont. ) • Steps 1, 2, 4 take O(n), • Step 3 takes T( n/5 ). • Let us see step 5: – At least half of medians in step 2 are x, thus at least 1/2 n/5 -2 groups contribute 3 elements which are x. i. e, 3( 1/2 n/5 -2) (3 n/10)-6. – Similarly, the number of elements x is also at least (3 n/10)-6. – Thus, |S 1| is at most (7 n/10)+6, similarly for |S 3|. – Thus SELECT in step 5 is called recursively on at most (7 n/10)+6 elements. • Recurrence is: – T(n)= O(1) if n< some value (i. e. 140) – T( n/5 )+T(7 n/10+6)+O(n) if n the value (i. e, 140) 9

Solve recurrence by substitution • Suppose T(n) cn, for some c. • T(n) c

Solve recurrence by substitution • Suppose T(n) cn, for some c. • T(n) c n/5 + c(7 n/10+6) + an – – – – cn/5+ c + 7/10 cn+6 c + an = 9/10 cn+an+7 c =cn+(-cn/10+an+7 c) Which is at most cn if -cn/10+an+7 c<0. i. e. , c 10 a(n/(n-70)) when n>70. So select n=140, and then c 20 a. Note: n may not be 140, any integer >70 is OK. 10

Summary • Bucket sort, counting sort, radix sort: – Their running times, – Modifications

Summary • Bucket sort, counting sort, radix sort: – Their running times, – Modifications • The ith order statistic of n elements S={a 1, a 2, …, an} : ith smallest elements: – Minimum and maximum. – Median, lower median, upper median • Selection in expected/average linear time – Worst case running time – Prune-and-search • Selection in worst-case linear time: – Why group size 5? 11