Mergesort Analysis of Algorithms Jon von Neumann and

  • Slides: 22
Download presentation
Mergesort, Analysis of Algorithms Jon von Neumann and ENIAC (1945)

Mergesort, Analysis of Algorithms Jon von Neumann and ENIAC (1945)

Why Does It Matter? Run time (nanoseconds) Time to solve a problem of size

Why Does It Matter? Run time (nanoseconds) Time to solve a problem of size Max size problem solved in one 1. 3 N 3 10 N 2 47 N log 2 N 48 N 1000 1. 3 seconds 10 msec 0. 4 msec 0. 048 msec 10, 000 22 minutes 1 second 6 msec 0. 48 msec 100, 000 15 days 1. 7 minutes 78 msec 4. 8 msec million 41 years 2. 8 hours 0. 94 seconds 48 msec 10 million 41 millennia 1. 7 weeks 11 seconds 0. 48 seconds second 920 10, 000 1 million 21 million minute 3, 600 77, 000 49 million 1. 3 billion hour 14, 000 600, 000 2. 4 trillion 76 trillion day 41, 000 2. 9 million 50 trillion 1, 800 trillion 1, 000 10+ 10 N multiplied by 10, time multiplied by 2

Orders of Magnitude Seconds Equivalent 1 1 second 10 10 seconds 102 1. 7

Orders of Magnitude Seconds Equivalent 1 1 second 10 10 seconds 102 1. 7 minutes 103 17 minutes 104 2. 8 hours 105 1. 1 days 106 1. 6 weeks 107 3. 8 months 108 3. 1 years 109 3. 1 decades 1010 3. 1 centuries . . . forever 1021 age of universe Meters Per Second Imperial Units Example 10 -10 1. 2 in / decade Continental drift 10 -8 1 ft / year Hair growing 10 -6 3. 4 in / day Glacier 10 -4 1. 2 ft / hour Gastro-intestinal tract 10 -2 2 ft / minute Ant 1 2. 2 mi / hour Human walk 102 220 mi / hour Propeller airplane 104 370 mi / min Space shuttle 106 620 mi / sec Earth in galactic orbit 108 62, 000 mi / sec 1/3 speed of light Powers of 2 210 thousand 220 million 230 billion 3

Impact of Better Algorithms Example 1: N-body-simulation. n Simulate gravitational interactions among N bodies.

Impact of Better Algorithms Example 1: N-body-simulation. n Simulate gravitational interactions among N bodies. – physicists want N = # atoms in universe n Brute force method: N 2 steps. n Appel (1981). N log N steps, enables new research. Example 2: Discrete Fourier Transform (DFT). n Breaks down waveforms (sound) into periodic components. foundation of signal processing – CD players, JPEG, analyzing astronomical data, etc. – n n Grade school method: N 2 steps. Runge-König (1924), Cooley-Tukey (1965). FFT algorithm: N log N steps, enables new technology. 4

Mergesort (divide-and-conquer) n Divide array into two halves. A A L L G G

Mergesort (divide-and-conquer) n Divide array into two halves. A A L L G G O O R R I T I H T M H S M S divide 5

Mergesort (divide-and-conquer) n Divide array into two halves. n Recursively sort each half. A

Mergesort (divide-and-conquer) n Divide array into two halves. n Recursively sort each half. A L G O R I T H M S divide A G L O R H I M S T sort 6

Mergesort (divide-and-conquer) n Divide array into two halves. n Recursively sort each half. n

Mergesort (divide-and-conquer) n Divide array into two halves. n Recursively sort each half. n Merge two halves to make sorted whole. A L G O R I T H M S divide A G L O R H I M S T sort A G H I L M O R S T merge 7

Mergesort Analysis How long does mergesort take? n n Bottleneck = merging (and copying).

Mergesort Analysis How long does mergesort take? n n Bottleneck = merging (and copying). – merging two files of size N/2 requires N comparisons T(N) = comparisons to mergesort N elements. – to make analysis cleaner, assume N is a power of 2 Claim. T(N) = N log 2 N. n n Note: same number of comparisons for ANY file. – even already sorted We'll prove several different ways to illustrate standard techniques. 8

Proof by Picture of Recursion Tree T(N) N T(N/4) 2(N/2) T(N/2) T(N/4) log 2

Proof by Picture of Recursion Tree T(N) N T(N/4) 2(N/2) T(N/2) T(N/4) log 2 N 4(N/4). . . 2 k (N / 2 k) T(N / 2 k) . . . T(2) T(2) N/2 (2) N log 2 N 9

Proof by Telescoping Claim. T(N) = N log 2 N (when N is a

Proof by Telescoping Claim. T(N) = N log 2 N (when N is a power of 2). Proof. For N > 1: 10

Mathematical Induction Mathematical induction. n n Powerful and general proof technique in discrete mathematics.

Mathematical Induction Mathematical induction. n n Powerful and general proof technique in discrete mathematics. To prove a theorem true for all integers k 0: – Base case: prove it to be true for N = 0. – Induction hypothesis: assuming it is true for arbitrary N – Induction step: show it is true for N + 1 Claim: 0 + 1 + 2 + 3 +. . . + N = N(N+1) / 2 for all N 0. Proof: (by mathematical induction) n n n Base case (N = 0). – 0 = 0(0+1) / 2. Induction hypothesis: assume 0 + 1 + 2 +. . . + N = N(N+1) / 2 Induction step: 0 + 1 +. . . + N + 1 = (0 + 1 +. . . + N) + N+1 = N (N+1) /2 + N+1 = (N+2)(N+1) / 2 11

Proof by Induction Claim. T(N) = N log 2 N (when N is a

Proof by Induction Claim. T(N) = N log 2 N (when N is a power of 2). Proof. (by induction on N) n Base case: N = 1. n Inductive hypothesis: T(N) = N log 2 N. n Goal: show that T(2 N) = 2 N log 2 (2 N). 12

Proof by Induction What if N is not a power of 2? n T(N)

Proof by Induction What if N is not a power of 2? n T(N) satisfies following recurrence. Claim. Proof. T(N) N log 2 N. See supplemental slides. 13

Computational Complexity Framework to study efficiency of algorithms. Example = sorting. n n MACHINE

Computational Complexity Framework to study efficiency of algorithms. Example = sorting. n n MACHINE MODEL = count fundamental operations. – count number of comparisons UPPER BOUND = algorithm to solve the problem (worst-case). – N log 2 N from mergesort LOWER BOUND = proof that no algorithm can do better. – N log 2 N - N log 2 e OPTIMAL ALGORITHM: lower bound ~ upper bound. – mergesort 14

Decision Tree a 1 < a 2 YES NO a 2 < a 3

Decision Tree a 1 < a 2 YES NO a 2 < a 3 a 1 < a 3 YES NO print a 1 , a 2 , a 3 YES print a 2 , a 1 , a 3 a 1 < a 3 YES print a 1 , a 3 , a 2 NO NO print a 3 , a 1 , a 2 < a 3 YES print a 2 , a 3 , a 1 NO print a 3 , a 2 , a 1 15

Comparison Based Sorting Lower Bound Theorem. Any comparison based sorting algorithm must use (N

Comparison Based Sorting Lower Bound Theorem. Any comparison based sorting algorithm must use (N log 2 N) comparisons. Proof. Worst case dictated by tree height h. n N! different orderings. n One (or more) leaves corresponding to each ordering. n Binary tree with N! leaves must have height Stirling's formula Food for thought. What if we don't use comparisons? ! Stay tuned for radix sort. 16

Extra Slides

Extra Slides

Proof by Induction Claim. T(N) N log 2 N. Proof. (by induction on N)

Proof by Induction Claim. T(N) N log 2 N. Proof. (by induction on N) n n n Base case: N = 1. Define n 1 = N / 2 , n 2 = N / 2. Induction step: assume true for 1, 2, . . . , N – 1. 18

Implementing Mergesort mergesort (see Sedgewick Program 8. 3) Item aux[MAXN]; uses scratch array void

Implementing Mergesort mergesort (see Sedgewick Program 8. 3) Item aux[MAXN]; uses scratch array void mergesort(Item a[], int left, int right) { int mid = (right + left) / 2; if (right <= left) return; mergesort(a, left, mid); mergesort(a, mid + 1, right); merge(a, left, mid, right); } 19

Implementing Mergesort merge (see Sedgewick Program 8. 2) void merge(Item a[], int left, int

Implementing Mergesort merge (see Sedgewick Program 8. 2) void merge(Item a[], int left, int mid, int right) { int i, j, k; for (i = mid+1; i > left; i--) aux[i-1] = a[i-1]; for (j = mid; j < right; j++) aux[right+mid-j] = a[j+1]; for (k = left; k <= right; k++) if (ITEMless(aux[i], aux[j])) a[k] = aux[i++]; else a[k] = aux[j--]; copy to temporary array merge two sorted sequences } 20

Profiling Mergesort Empirically Mergesort prof. out void merge(Item a[], int left, int mid, int

Profiling Mergesort Empirically Mergesort prof. out void merge(Item a[], int left, int mid, int right) <999>{ int i, j, k; for (<999>i = mid+1; <6043>i > left; <5044>i--) <5044>aux[i-1] = a[i-1]; for (<999>j = mid; <5931>j < right; <4932>j++) <4932>aux[right+mid-j] = a[j+1]; for (<999>k = left; <10975>k <= right; <9976>k++) Striking feature: if (<9976>ITEMless(aux[i], aux[j])) All numbers <4543>a[k] = aux[i++]; SMALL! else <5433>a[k] = aux[j--]; <999>} # comparisons Theory ~ N log 2 N = 9, 966 void mergesort(Item a[], int left, int right) <1999>{ Actual = 9, 976 int mid = <1999>(right + left) / 2; if (<1999>right <= left) return<1000>; <999>mergesort(a, aux, left, mid); <999>mergesort(a, aux, mid+1, right); <999>merge(a, aux, left, mid, right); <1999>} 21

Sorting Analysis Summary Running time estimates: n Home pc executes 108 comparisons/second. n Supercomputer

Sorting Analysis Summary Running time estimates: n Home pc executes 108 comparisons/second. n Supercomputer executes 1012 comparisons/second. computer home super Insertion Sort (N 2) thousand million instant 2. 8 hours instant 1 second billion 317 years 1. 6 weeks Mergesort (N log N) thousand million billion instant 1 sec 18 min instant Quicksort (N log N) thousand million billion instant 0. 3 sec 6 min instant Lesson 1: good algorithms are better than supercomputers. Lesson 2: great algorithms are better than good ones. 22