Chapter 9 Medians and order statistics Lee HsiuHui

  • Slides: 31
Download presentation
Chapter 9 Medians and order statistics Lee, Hsiu-Hui Ack: This presentation is based on

Chapter 9 Medians and order statistics Lee, Hsiu-Hui Ack: This presentation is based on the lecture slides from Hsu, Lih-Hsing, as well as various materials from the web. 20071102 chap 09 Hsiu-Hui Lee

Order Statistics • The ith order statistic of a set of n element is

Order Statistics • The ith order statistic of a set of n element is the ith smallest. • Selection problem Input: a set A of n (distinct) numbers and a number i, with 1 ≤ i ≤ n. Output: the element x A that is larger than exactly i - 1 other elements of A. 20071102 chap 09 Hsiu-Hui Lee 2

Order Statistics Select the ith smallest of n element with rank i i =

Order Statistics Select the ith smallest of n element with rank i i = 1: minimum; i = n: maximum; i= lower or upper median. Naive algorithm: Sort and index ith element. • Worst-case running time = O(nlg n) using merge sort or heapsort (not quicksort). • 20071102 chap 09 Hsiu-Hui Lee 3

9. 1 Minimum and Maximum 20071102 chap 09 Hsiu-Hui Lee 4

9. 1 Minimum and Maximum 20071102 chap 09 Hsiu-Hui Lee 4

Simultaneous min and max • • • Maintain the minimum and maximum of elements

Simultaneous min and max • • • Maintain the minimum and maximum of elements seen so far. Don’t compare each element to the minimum and maximum separately. Process elements in pairs. Compare the elements of a pair to each other. Then compare the larger element to the maximum so far, and compare the smaller element to the minimum so far. This leads to only 3 comparisons for every 2 elements. 20071102 chap 09 Hsiu-Hui Lee 5

Simultaneous min and max Setting up the initial values for the min and max

Simultaneous min and max Setting up the initial values for the min and max depends on whether n is odd or even. (a) If n is odd, set both min and max to the first element. Then process the rest of the elements in pairs. (b) If n is even, compare the first two elements and assign the larger to max and the smaller to min. Then process the rest of the elements in pairs. in either case: 20071102 chap 09 Hsiu-Hui Lee 6

9. 2 Selection in expected linear time 20071102 chap 09 Hsiu-Hui Lee 7

9. 2 Selection in expected linear time 20071102 chap 09 Hsiu-Hui Lee 7

20071102 chap 09 Hsiu-Hui Lee 8

20071102 chap 09 Hsiu-Hui Lee 8

Example 20071102 chap 09 Hsiu-Hui Lee 9

Example 20071102 chap 09 Hsiu-Hui Lee 9

Analysis of expected time The analysis follows that of randomized quicksort, but it’s a

Analysis of expected time The analysis follows that of randomized quicksort, but it’s a little different. Let T(n) =the random variable for the running time of RAND-SELECTon an input of size n, assuming random numbers are independent. For k= 0, 1, …, n– 1, define the indicator random variable if PARTITION generates a otherwise 20071102 chap 09 Hsiu-Hui Lee split, 10

Analysis (continued) To obtain an upper bound, assume that the ith element always falls

Analysis (continued) To obtain an upper bound, assume that the ith element always falls in the larger side of the partition: if if if 20071102 chap 09 Hsiu-Hui Lee 11

Analysis (continued) 20071102 chap 09 Hsiu-Hui Lee 12

Analysis (continued) 20071102 chap 09 Hsiu-Hui Lee 12

Taking expected values, we have 20071102 chap 09 Hsiu-Hui Lee 13

Taking expected values, we have 20071102 chap 09 Hsiu-Hui Lee 13

Solve this recurrence by substitution: Guess T(n) ≦cn 20071102 chap 09 Hsiu-Hui Lee 14

Solve this recurrence by substitution: Guess T(n) ≦cn 20071102 chap 09 Hsiu-Hui Lee 14

We choose the constant c so that c/4 - a > 0, i. d.

We choose the constant c so that c/4 - a > 0, i. d. , c>4 a, we can Divide both sides by c/4 – a, giving Thus, if we assume that T(n)=O(1) for n < 2 c/(c-4 a), we have E[T(n)] = O(n). 20071102 chap 09 Hsiu-Hui Lee 15

9. 3 Worst-case linear-time order statistics (According MIT) SELECT(i, n) 1. Divide the n

9. 3 Worst-case linear-time order statistics (According MIT) SELECT(i, n) 1. Divide the n elements into groups of 5. Find the median of each 5 -element group by rote. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 3. Partition around the pivot x. Let k= rank(x). 4. If i = k then return x Same as else if i < k RANDthen recursively SELECT the i th SELECT smallest element in the lower part else recursively SELECT the (i-k) th smallest element in the upper part 20071102 chap 09 Hsiu-Hui Lee 16

Choosing the pivot 20071102 chap 09 Hsiu-Hui Lee 17

Choosing the pivot 20071102 chap 09 Hsiu-Hui Lee 17

Choosing the pivot 1. Divide the n elements into groups of 5. 20071102 chap

Choosing the pivot 1. Divide the n elements into groups of 5. 20071102 chap 09 Hsiu-Hui Lee 18

Choosing the pivot 1. Divide the n elements into groups of 5. Find the

Choosing the pivot 1. Divide the n elements into groups of 5. Find the median of each 5 -element group by rote. lesser greater 20071102 chap 09 Hsiu-Hui Lee 19

Choosing the pivot 1. Divide the n elements into groups of 5. Find the

Choosing the pivot 1. Divide the n elements into groups of 5. Find the median of each 5 -element group by rote. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 20071102 chap 09 Hsiu-Hui Lee lesser greater 20

Analysis At least half the group medians are ≤ x , which is at

Analysis At least half the group medians are ≤ x , which is at least group medians. lesser greater 20071102 chap 09 Hsiu-Hui Lee 21

Analysis (Assume all elements are distinct. ) At least half the group medians are

Analysis (Assume all elements are distinct. ) At least half the group medians are ≤ x , which is lesser at least n/5/2= n/10 group medians. • Therefore, at least 3 n/10 elements are ≤ x. greater 20071102 chap 09 Hsiu-Hui Lee 22

Analysis (Assume all elements are distinct. ) At least half the group medians are

Analysis (Assume all elements are distinct. ) At least half the group medians are ≤ x , which is lesser at least n/5/2= n/10 group medians. • Therefore, at least 3 n/10 elements are ≤ x. • Similarly, at least 3 n/10 elements are ≥ x. greater 20071102 chap 09 Hsiu-Hui Lee 23

Minor simplification • For n ≥ 50, we have 3 n/10 ≥ n/4. •

Minor simplification • For n ≥ 50, we have 3 n/10 ≥ n/4. • Therefore, for n ≥ 50 the recursive call to SELECT in Step 4 is executed recursively on ≤ 3 n/4 elements. • Thus, the recurrence for running time can assume that Step 4 takes time T(3 n/4) in the worst case. • For n< 50, we know that the worst-case time is T(n) = Θ(1). 20071102 chap 09 Hsiu-Hui Lee 24

Developing the recurrence T(n) Θ(n) T(n/5) Θ(n) T(3 n/4) SELECT(i, n) 1. Divide the

Developing the recurrence T(n) Θ(n) T(n/5) Θ(n) T(3 n/4) SELECT(i, n) 1. Divide the n elements into groups of 5. Find the median of each 5 -element group by rote. 2. Recursively SELECT the median x of the n/5 group medians to be the pivot. 3. Partition around the pivot x. Let k= rank(x). 4. If i = k then return x else if i < k then recursively SELECT the i th smallest element in the lower part else recursively SELECT the (i-k) th smallest element in the upper part 20071102 chap 09 Hsiu-Hui Lee 25

Solving the recurrence Substitution: if c is chosen large enough to handle both the

Solving the recurrence Substitution: if c is chosen large enough to handle both the Θ(n) and the initial conditions. 20071102 chap 09 Hsiu-Hui Lee 26

9. 3 Selection in worst-case linear time (According Text) 20071102 chap 09 Hsiu-Hui Lee

9. 3 Selection in worst-case linear time (According Text) 20071102 chap 09 Hsiu-Hui Lee 27

20071102 chap 09 Hsiu-Hui Lee 28

20071102 chap 09 Hsiu-Hui Lee 28

20071102 chap 09 Hsiu-Hui Lee 29

20071102 chap 09 Hsiu-Hui Lee 29

20071102 chap 09 Hsiu-Hui Lee 30

20071102 chap 09 Hsiu-Hui Lee 30

20071102 chap 09 Hsiu-Hui Lee 31

20071102 chap 09 Hsiu-Hui Lee 31