Divide Conquer Sorting Computer Science Engineering Otterbein University

Divide & Conquer Sorting Computer Science Engineering Otterbein University COMP 2100 Otterbein University

Alerts Computer Science Otterbein University o Immutable Deadlines: Project 2 due Thursday o Homework #5 due on Monday, 11/2 o No Class: Friday, October 30

Divide & Conquer o Divide & conquer design paradigm Divide: divide the input data S in two (or more) disjoint subsets S 1 and S 2 Recur: Solve the subproblems associated with S 1 and S 2 Conquer: combine the solutions for S 1 and S 2 into a solution for S o Base case: directly solve and do not divide for “small” subproblem sizes (typically 0 or 1) o Merge. Sort and Quick. Sort are sorting algorithms based on divide & conquer o Merge. Sort divides based on position o Quick. Sort divides based on value

Merge. Sort (1945) o Merge. Sort on an input sequence S with n elements consists of three steps: Divide: partition S into two sequences S 1 and S 2 of about n/2 elements each Recur: recursively sort S 1 and S 2 Conquer: merge S 1 and S 2 into a unique sorted sequence Algorithmmerge. Sort (S) Input sequence S with n elements Output sequence S sorted if n > 1 (S 1, S 2) ← partition (S, n/2) S 1 ← merge. Sort (S 1) S 2 ← merge. Sort (S 2) S ← merge (S 1, S 2) return S

Partitioning a Sequence o The divide step of merge. Sort consists of partitioning input sequence S o Use new arrays for the subsequences o It is easy to see that partition takes O(n) time Algorithmpartition (S, k) Input sequence S, with n items; k, partition size Output partition of S into S 1 of size k and S 2 of size n – k S 1 ← new array of size k S 2 ← new array of size n - k pos ← 0 for i ← 0 to k-1 do S 1[i] ← S[pos++] for i ← 0 to n-k-1 do S 2[i] ← S[pos++] return (S 1, S 2)

Merging Two Sorted Sequences Algorithmmerge (S 1, S 2) o The conquer step Input sequences S 1 and S 2, both of merge. Sort sorted, with n total items consists of combined merging two Output sorted sequence containing all the elements of S 1 and S 2 sorted sequences S ← new array of size n S 1 and S 2 p ← length of S ; q ← length of S 2 o Use new array for i ← 0; j ← 0; k 1← 0 the merged whilei < p and j < q do if S 1[i] < S 2[j] sequence S[k++] ← S 1[i++] o It is easy to see else S[k++] ← S 2[j++] that partition takes if i =p O(n) time copy S 2[j. . q-1] to S[k. . p+q-1] else copy S 1[i. . p-1] to S[k. . p+q-1] return S

Merge. Sort in Action o Partition 7294|3861

Merge. Sort in Action o Recursive Call, Partition 7294|3861 72|94

Merge. Sort in Action o Recursive Call, Partition 7294|3861 72|94 7|2

Merge. Sort in Action o Recursive Call, Base Case 7294|3861 72|94 7|2 7→ 7

Merge. Sort in Action o Recursive Call, Base Case 7294|3861 72|94 7|2 7→ 7 2→ 2

Merge. Sort in Action o Merge 7294|3861 72|94 7 | 2→ 2 7 7→ 7 2→ 2

Merge. Sort in Action o Recursive Call, …, Base Case, Merge 7294|3861 72|94 7 | 2→ 2 7 7→ 7 2→ 2 94 → 49 9→ 9 4→ 4

Merge. Sort in Action o Merge 7294|3861 7 2|9 4 → 2 4 7 9 7 | 2→ 2 7 7→ 7 2→ 2 94 → 49 9→ 9 4→ 4

Merge. Sort in Action o Recursive Call, …, Merge 7294|3861 7 2|9 4 → 2 4 7 9 7 | 2→ 2 7 7→ 7 2→ 2 94 → 49 9→ 9 4→ 4 38|61→ 1368 38→ 38 3→ 3 8→ 8 61→ 16 6→ 6 1→ 1

Merge. Sort in Action o Merge 7294|3861→ 12346789 7 2|9 4 → 2 4 7 9 7 | 2→ 2 7 7→ 7 2→ 2 94 → 49 9→ 9 4→ 4 38|61→ 1368 38→ 38 3→ 3 8→ 8 61→ 16 6→ 6 1→ 1

Merge. Sort Analysis o Use recurrence relation. Algorithmmerge. Sort (S) o T(0) = T(1) = 2 o T(n) = cn + T(n/2) + cn = 2 cn + 2 T(n/2) where c is a constant o Master Theorem: T(n) Θ(n lg n) Input sequence S with n elements Output sequence S sorted if n > 1 (S 1, S 2) ← partition (S, n/2) S 1 ← merge. Sort (S 1) S 2 ← merge. Sort (S 2) S ← merge (S 1, S 2) return S

Why is Merge. Sort Θ(n lg n)? o The height of the merge. Sort tree is O(lg n) because at each recursive call we divide the sequence in half o The overall amount of work done at the nodes of depth i is O(n): partition, merge, and recursive calls o Thus the total running time of merge. Sort is O(n lg n) depth #calls 0 1 1 2 i 2 i … … size cost n O(n) n/2 i O(n) … …

Practical Considerations o o o called ny y l t s e a od Mergesort’s use of auxiliary arraysmeisrgeexpensive sort, m rmance on m d , an ral erfo d u p t e l a d a n r e , u e t o e Intr erna be ptricky ons n stablversion Implementing an in-place pcan s , i u r e s a v i t s p a lesort m a h p o d t c I m a. ) a ! ) s n -lg(N ink> es a ned w n b u i t < a r c t h y i l t s h e d s g s s (le us hi This d hey, I earne y o i a v r r e r a ht, p d g ( i s e ' r t r r n e o o o t d s h t r o much Pyt tim Mergesort has for small arrays e, lef s overhead iallytoo c a t r n t a o s p a y f f rra as ous a i t v e e e inds o y h r t k , p ) r ve the use d some N-1 Solution: the recursion early items) s. s Cut-off s o(~10 to and y a e n a i h r w t r c i r e a f a g in , an m as om smalloblocks g d e r d e n i e t a p m r u s insertion sort for n r e on or hybrid hell, the main e next run, th omplication f ts is cmerge step if g thcircuitelsthe n e i It is possible short y f i t In a nu ly ito n g de cy. rythin e n t e e v a i n E c r i f. e f " alt max(left) < min(right) ently ory e g i m l l e e t m n f i runs " sure o a e m on hard-w. . . Old versions of Java used this as the system sort for Object arrays (Now uses Timsort)

Summary so far. . .

Quick. Sort (1959) o. Select a partitioning element, p o. Rearrange the array such that element p is in its final position what is to the left of p is less than or equal to p what is to the right of p is greater than or equal to p o. Sort the left and right sub-arrays independently by recursion

Quick. Sort o Divide: Select any element S[k] of the array to be the pivot Partition S into two sequences S 1 and S 2, such that o S[k] is relocated to its o o final position, S[i] j < i S[j] < S[i] j > i S[j] > S[i] Recur: recursively sort S 1 and S 2 Conquer: trivial Algorithmquick. Sort (S, i, j) Input sequence S; i and j, the upper and lower bound of sorting Output sequence S sorted from i to j if j - i > 0 select pivot and move it to S[j] p ← partition (S, i, j) quick. Sort (S 1, i, p-1) quick. Sort (S 2 , p+1, j) k i

Selecting the Pivot o Fixed element Right-most Left-most Center o Median o Random o Median of 3 (right, left, center)

Quick. Sort Partition o Quick. Sort’s partition is performed in place o Scan from ends towards center looking for pairs of elements out of order to be exchanged o It is easy to see that partition takes O(n) time Algorithmpartition (S, i, j) Input sequence S, with pivot at S[j]; i and j, the upper and lower bound of sorting Output S with elements rearranged so elements smaller than pivot are to its left & those larger to it right; returns final index of pivot p ← S[j] u ← i; v ← j-1 whileu < v do while. S[u] < p do u++ while. S[v] > p do v— swap (S, u, v) swap (S, u, j) return u

Quick. Sort in Action 3 u 72 1 5 47 34 20 10 9 v 23

Quick. Sort in Action 3 72 u 1 5 47 34 20 10 9 v 23

Quick. Sort in Action 3 9 1 5 47 u 34 20 10 v 72 23

Quick. Sort in Action 3 9 1 5 10 34 20 u v 47 72 23

Quick. Sort in Action 3 9 1 5 10 20 v 34 u 47 72 23

Quick. Sort in Action 3 9 1 5 10 20 23 v u 47 72 34

Quick. Sort is not stable 6 8 5 2 7 1 2 3 2 1 2 3 7 8 6 5

Quick. Sort Analysis o T(n) = n + T(p – 1) + T(n – p) o If pivot always lands in the middle, this yields T(n) ≤ 2 T(n/2) + n o Thus, T(n) Θ(n lg n) o If pivot always lands at the end, we get T(n) = n + T(n – 1) o Here, T(n) Θ(n 2) Algorithmquick. Sort (S, i, j) Input sequence S; i and j, the upper and lower bound of sorting Output sequence S sorted from i to j if j - i > 0 select pivot and move it to S[j] p ← partition (S, i, j) quick. Sort (S 1, i, p-1) quick. Sort (S 2 , p+1, j) o Best case: Θ(n lg n) o Worst case: Θ(n 2)

Average Case Analysis o It turns out, as we will now prove, that Quick. Sort’s average time complexity is Θ(n log n) o Warning: this derivation is long and complicated—now may be a good time for a nap… Z Z

Average Case Analysis o It is reasonable to assume that the pivot has equal probability (1/n) to land in each of the n possible positions. o Since we can simplify:

Average Case Analysis Multiply by n Substitute (n – 1) for n Subtract last two equations Simplify

Average Case Analysis Divide by n(n+1) Drop small term, creating inequality Define F(n) = Ta(n)/(n+1) and substitute

Average Case Analysis Expand completely into a sum Remember that F(n) = Ta(n)/(n+1), so WAKE UP!!

For more information. . . Computer Science Otterbein University o This was a very fast overview/review of these two sorting algorithms o Sections 2. 2 & 2. 3 are a treasure trove of good, solid reasoning about mergesort and quicksort variations and tweaks: definitely worthwhile reading!