CS 221 Algorithms and Data Structures Lecture 4

  • Slides: 47
Download presentation
CS 221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2014

CS 221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2014 W 1 1

Quick Review of Sorts • Insertion Sort: Keep a list of already sorted elements.

Quick Review of Sorts • Insertion Sort: Keep a list of already sorted elements. One by one, insert new elements into the right place in the sorted list. • Selection Sort: Repeatedly find the smallest (or largest) element and put it in the next slot (where it belongs in the final sorted list). • Merge Sort: Divide the list in half, sort the halves, merge them back together. (Base case: length 1. )2

Insertion Sort Invariant in sorted order untouched each iteration moves the line right one

Insertion Sort Invariant in sorted order untouched each iteration moves the line right one element 3

Selection Sort Invariant smallest, 2 nd smallest, 3 rd smallest, … in order remainder

Selection Sort Invariant smallest, 2 nd smallest, 3 rd smallest, … in order remainder in no particular order each iteration moves the line right one element 4

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 5

Categorizing Sorting Algorithms • Computational complexity – Average case behaviour: Why do we care?

Categorizing Sorting Algorithms • Computational complexity – Average case behaviour: Why do we care? – Worst/best case behaviour: Why do we care? How often do we re-sorted, reverse sorted, or “almost” sorted (k swaps from sorted where k << n) lists? • Stability: What happens to elements with identical keys? • Memory Usage: How much extra memory is used? 6

Comparing our “PQSort” Algorithms • Computational complexity – Selection Sort: Always makes n passes

Comparing our “PQSort” Algorithms • Computational complexity – Selection Sort: Always makes n passes with a “triangular” shape. Best/worst/average case (n 2) – Insertion Sort: Always makes n passes, but if we’re lucky and search for the maximum from the right, only constant work is needed on each pass. Best case (n); worst/average case: (n 2) – Heap Sort: Always makes n passes needing O(lg n) on each pass. Best/worst/average case: (n lg n). Note: best cases assume distinct elements. 7 With identical elements, Heap Sort can get (n) performance.

Insertion Sort Best Case 0 1 2 3 4 5 6 7 8 9

Insertion Sort Best Case 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 9 PQ 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 PQ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 If we search from the right: constant time per pass! 8 PQ

Comparing our “PQSort” Algorithms • Stability – Selection: Unstable. (Say we swap in the

Comparing our “PQSort” Algorithms • Stability – Selection: Unstable. (Say we swap in the “leftmost” smallest element, what about the element swapped out? ) – Insertion: Easily made stable (when building from the left, find the rightmost slot for a new element). – Heap: Unstable • Memory use: All three are essentially “in-place” algorithms with small O(1) extra space requirements. • Cache access: Not detailed in 221, but… algorithms that don’t “jump around” tend to perform better in modern memory systems. Which of these “jumps around”? 9 But note: there’s a trick to make any sort stable.

T(n)=100 Comparison of growth. . . n 2 nlgn n n=100 Reminder: asymptotic differences

T(n)=100 Comparison of growth. . . n 2 nlgn n n=100 Reminder: asymptotic differences matter, 10 but a factor of lg n doesn’t matter as much as a factor of n.

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 11

Merge. Sort Mergesort belongs to a class of algorithms known as “divide and conquer”

Merge. Sort Mergesort belongs to a class of algorithms known as “divide and conquer” algorithms (your recursion sense should be tingling here. . . ). The problem space is continually split in half, recursively applying the algorithm to each half until the base case is reached. 12

Merge. Sort Algorithm 1. If the array has 0 or 1 elements, it’s sorted.

Merge. Sort Algorithm 1. If the array has 0 or 1 elements, it’s sorted. Else… 2. Split the array into two halves 3. Sort each half recursively (i. e. , using mergesort) 4. Merge the sorted halves to produce one sorted result: 1. Consider the two halves to be queues. 2. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result. 13

Merge. Sort Performance Analysis 1. If the array has 0 or 1 elements, it’s

Merge. Sort Performance Analysis 1. If the array has 0 or 1 elements, it’s sorted. Else… 2. Split the array into two halves T(1) = 1 3. Sort each half recursively (i. e. , using mergesort) 2*T(n/2) 4. Merge the sorted halves to produce one sorted result: n 1. Consider the two halves to be queues. 2. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result. 14

Merge. Sort Performance Analysis T(1) = 1 T(n) = 2 T(n/2) + n =

Merge. Sort Performance Analysis T(1) = 1 T(n) = 2 T(n/2) + n = 4 T(n/4) + 2(n/2) + n = 8 T(n/8) + 4(n/4) + 2(n/2) + n = 8 T(n/8) + n + n = 8 T(n/8) + 3 n = 2 i. T(n/2 i) + in. Let i = lg n T(n) = n. T(1) + n lg n = n + n lg n (n lg n) We ignored floors/ceilings. To prove performance formally, we’d use 15 this as a guess and prove it with floors/ceilings by induction.

Try it on this array! 3 -4 1 2 5 9 1 2 6

Try it on this array! 3 -4 1 2 5 9 1 2 6 3 3 5 6 9 16

Try it on this array! 3 3 -4 3 5 9 5 -4 3

Try it on this array! 3 3 -4 3 5 9 5 -4 3 -4 5 3 1 2 6 9 1 2 9 5 1 9 6 2 1 6 2 6 * -4 3 3 Where does -4 3 the red 3 go? 5 1 9 2 6 ** 3 5 -4 1 2 3 3 1 2 6 5 6 9 9 17

Mergesort (by Jon Bentley): void msort(int x[], int lo, int hi, int tmp[]) {

Mergesort (by Jon Bentley): void msort(int x[], int lo, int hi, int tmp[]) { if (lo >= hi) return; int mid = (lo+hi)/2; msort(x, lo, mid, tmp); msort(x, mid+1, hi, tmp); merge(x, lo, mid, hi, tmp); } void mergesort(int x[], int n) { int *tmp = new int[n]; msort(x, 0, n-1, tmp); delete[] tmp; } 18

Merge (by Jon Bentley): void merge(int x[], int lo, int mid, int hi, int

Merge (by Jon Bentley): void merge(int x[], int lo, int mid, int hi, int tmp[]) { int a = lo, b = mid+1; for( int k = lo; k <= hi; k++ ) { if( a <= mid && (b > hi || x[a] < x[b]) ) tmp[k] = x[a++]; else tmp[k] = x[b++]; } for( int k = lo; k <= hi; k++ ) x[k] = tmp[k]; } 19 Elegant & brilliant… but not how I’d write it.

Mergesort (by Steve): void msort(int data[], int left, int right, int temp[]) { if

Mergesort (by Steve): void msort(int data[], int left, int right, int temp[]) { if (left >= right) return; int mid = (left+right)/2; msort(data, left, mid, temp); msort(data, mid, right, temp); merge(data, left, mid, right, temp); } void mergesort(int data[], int n) { int *temp = new int[n]; msort(data, 0, n-1, temp); delete[] temp; } 20

void merge(int data[], int left, int mid, int right, int temp[]) { int front.

void merge(int data[], int left, int mid, int right, int temp[]) { int front. L = left, front. R = mid+1; for(int dest = left; dest <= right; ++dest) { if (front. L > mid) { // left is empty? temp[dest] = data[front. R]; ++front. R; } else if (front. R > mid) { // right is empty? temp[dest] = data[front. L]; ++front. L; } else if (data[front. R] < data[front. L]) { temp[dest] = data[front. R]; // right elt ++front. R; // is smaller } else { temp[dest] = data[front. L]; // left elt ++front. L; // is smaller } } for( int k = left; k <= right; ++k ) data[k] = temp[k]; } 21

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 22

Quick. Sort In practice, one of the fastest sorting algorithms is Quicksort, developed in

Quick. Sort In practice, one of the fastest sorting algorithms is Quicksort, developed in 1961 by C. A. R. Hoare. Comparison-based: examines elements by comparing them to other elements Divide-and-conquer: divides into “halves” (that may be very unequal) and recursively sorts 23

Quick. Sort algorithm • Pick a pivot • Reorder the list such that all

Quick. Sort algorithm • Pick a pivot • Reorder the list such that all elements < pivot are on the left, while all elements pivot are on the right • Recursively sort each side Are we missing a base 24 case?

Partitioning • The act of splitting up an array according to the pivot is

Partitioning • The act of splitting up an array according to the pivot is called partitioning • Consider the following: -4 1 -3 2 3 pivot left partition 5 4 7 right partition 25

Quick. Sort Visually P P P P Sorted! 26

Quick. Sort Visually P P P P Sorted! 26

Quick. Sort (by Jon Bentley): void qsort(int x[], int lo, int hi) { int

Quick. Sort (by Jon Bentley): void qsort(int x[], int lo, int hi) { int i, p; if (lo >= hi) return; p = lo; for( i=lo+1; i <= hi; i++ ) if( x[i] < x[lo] ) swap(x[++p], x[i]); swap(x[lo], x[p]); qsort(x, lo, p-1); qsort(x, p+1, hi); } void quicksort(int x[], int n) { qsort(x, 0, n-1); } 27 Elegant & brilliant… but still not how I’d write it.

Quick. Sort Example (using intuitive algorithm, not Bentley’s) (Pick first element as pivot, “scoot”

Quick. Sort Example (using intuitive algorithm, not Bentley’s) (Pick first element as pivot, “scoot” elements to left/right. ) 2 -4 6 1 5 -3 3 7 28

Quick. Sort: Complexity • Recall that Quicksort is comparison based – Thus, the operations

Quick. Sort: Complexity • Recall that Quicksort is comparison based – Thus, the operations are comparisons • In our partitioning task, we compared each element to the pivot – Thus, the total number of comparisons is N – As with Merge. Sort, if one of the partitions is about half (or any constant fraction of) the size of the array, complexity is (n lg n). • In the worst case, however, we end up with a partition with a 1 and n-1 split 29

Quick. Sort Visually: Worst case P P 30

Quick. Sort Visually: Worst case P P 30

Quick. Sort: Worst Case • In the overall worst-case, this happens at every step…

Quick. Sort: Worst Case • In the overall worst-case, this happens at every step… – Thus we have N comparisons in the first step – N-1 comparisons in the second step – N-2 comparisons in the third step – : – …or approximately n 2 31

Quick. Sort: Average Case (Intuition) • Clearly pivot choice is important – It has

Quick. Sort: Average Case (Intuition) • Clearly pivot choice is important – It has a direct impact on the performance of the sort – Hence, Quick. Sort is fragile, or at least “attackable” • So how do we pick a good pivot? 32

Quick. Sort: Average Case (Intuition) • Let’s assume that pivot choice is random –

Quick. Sort: Average Case (Intuition) • Let’s assume that pivot choice is random – Half the time the pivot will be from the centre half of the array – Thus at worst the split will be n/4 and 3 n/4 33

Quick. Sort: Average Case (Intuition) • We can apply this to the notion of

Quick. Sort: Average Case (Intuition) • We can apply this to the notion of a good split – Every “good” split: 2 partitions of size n/4 and 3 n/4 • Or divides N by 4/3 – Hence, we make up to log 4/3(N) “good” splits • Expected # of partitions is at most 2 * log 4/3(N) – O(lg. N) • Given N comparisons at each partitioning step, we have (N lg N) 34

Quicksort Complexity: How does it compare? N 10, 000 20, 000 Insertion Quicksort Sort

Quicksort Complexity: How does it compare? N 10, 000 20, 000 Insertion Quicksort Sort 4. 1777 0. 05 sec 20. 52 sec 0. 11 sec 4666 sec 300, 000 2. 15 sec (1. 25 hrs) 35

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 36

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Complexity – – –

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Complexity – – – Best case: Insert < Quick, Merge, Heap < Select Average case: Quick, Merge, Heap < Insert, Select Worst case: Merge, Heap < Quick, Insert, Select Usually on “real” data: Quick < Merge < Heap < I/S (not asymptotic) On very short lists: quadratic sorts may have an advantage (so, some quick/merge implementations “bottom out” to these as base cases) Some details depend on implementation! (E. g. , an initial check whether the last elt of the left sublist is less 37 than first of the right can make merge’s best case linear. )

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Stability – Easily Made

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare? Stability – Easily Made Stable: Insert, Merge (prefer the “left” of the two sorted sublists on ties) – Unstable: Heap – Challenging to Make Stable: Quick, Select • Memory use: – Insert, Select, Heap < Quick < Merge How much stack space does recursive Quick. Sort use? 38 In the worst case? Could we make it better?

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 39

Complexity of Sorting Using Comparisons as a Problem Each comparison is a “choice point”

Complexity of Sorting Using Comparisons as a Problem Each comparison is a “choice point” in the algorithm. You can do one thing if the comparison is true and another if false. So, the whole algorithm is like a binary tree… x<y yes no a<b yes a<d no sorted! yes … yes c<d no z<c no … yes … sorted! no … 40

Complexity of Sorting Using Comparisons as a Problem The algorithm spits out a (possibly

Complexity of Sorting Using Comparisons as a Problem The algorithm spits out a (possibly different) sorted list at each leaf. What’s the maximum number of leaves? x<y yes no a<b yes a<d no sorted! yes … yes c<d no z<c no … yes … sorted! no … 41

Complexity of Sorting Using Comparisons as a Problem There are n! possible permutations of

Complexity of Sorting Using Comparisons as a Problem There are n! possible permutations of a sorted list (i. e. , input orders for a given set of input elements). How deep must the tree be to distinguish those input orderings? x<y yes no a<b yes a<d no sorted! yes … yes c<d no z<c no … yes … sorted! no … 42

Complexity of Sorting Using Comparisons as a Problem If the tree is not at

Complexity of Sorting Using Comparisons as a Problem If the tree is not at least lg(n!) deep, then there’s some pair of orderings I could feed the algorithm which the algorithm does not distinguish. So, it must not successfully sort one of those two orderings. x<y yes no a<b yes a<d no sorted! yes … yes c<d no z<c no … yes … sorted! no … 43

Complexity of Sorting Using Comparisons as a Problem QED: The complexity of sorting using

Complexity of Sorting Using Comparisons as a Problem QED: The complexity of sorting using comparisons is (n lg n) in the worst case, regardless of algorithm! In general, we can lower-bound but not upper-bound the complexity of problems. (Why not? Because I can give as crappy an algorithm as I please to solve any problem. ) 44

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort

Today’s Outline • Categorizing/Comparing Sorting Algorithms – PQSorts as examples • • Merge. Sort Quick. Sort More Comparisons Complexity of Sorting 45

To Do • Read: Epp Section 9. 5 and KW Section 10. 1, 10.

To Do • Read: Epp Section 9. 5 and KW Section 10. 1, 10. 4, and 10. 7 -10. 10 46

Next Up • B+-trees and Giant Branching Factors 47

Next Up • B+-trees and Giant Branching Factors 47