EFFICIENCY SORTING CITS 1001 2 Listen to the

  • Slides: 65
Download presentation
EFFICIENCY & SORTING CITS 1001

EFFICIENCY & SORTING CITS 1001

2 Listen to the sound of sorting • Various algorithms • http: //www. youtube.

2 Listen to the sound of sorting • Various algorithms • http: //www. youtube. com/watch? v=t 8 g-i. YGHp. EA • Quicksort • http: //www. youtube. com/watch? v=m 1 PS 8 IR 6 Td 0 • Or Google for “sound of sorting”

3 Scope of this lecture • Linear Search • Sorting algorithms and efficiency •

3 Scope of this lecture • Linear Search • Sorting algorithms and efficiency • References: • Wirth, Algorithms + Data Structures = Programs, Chapter 2 • Knuth, The Art of Computer Programming, Volume 3, Sorting and Searching • This lecture is based on powerpoint slides originally by Gordon Royle, UWA

Why study sorting algorithms? NOT so you can reproduce them in your Java applications

Why study sorting algorithms? NOT so you can reproduce them in your Java applications If you want to sort a collection of objects in Java, use the Collections library A list l may be sorted by calling Collections. sort(l); See https: //docs. oracle. com/javase/tutorial/collections/interfaces /order. html for a tutorial

So, why study sorting algorithms? • Consider sorting as an introduction to algorithmic thinking

So, why study sorting algorithms? • Consider sorting as an introduction to algorithmic thinking • “Algorithms and data structures are the basics of CS and engineering. • If you learn them your thinking process improves. • Your coding style improves. • If you read good code (and understand) you become a better coder. • Where do you find better code than those few lines of precise, elegantly crafter, peer reviewed by millions of people? • And sorting is the foundation of many other things. ” Maruf Maniruzzaman, Software Engineer at Microsoft

6 Searching • Searching refers to the process of finding data items that match

6 Searching • Searching refers to the process of finding data items that match certain criteria • We may just want a yes/no answer to a question, or we may want additional details as well • Find out whether any students got a mark of 49 • Find out which students got a mark of 49 • The simplest searching technique is called linear search, which involves looking through each element in turn until we find one that matches the criteria

7 Our favourite student class public class Student { private String student. ID; private

7 Our favourite student class public class Student { private String student. ID; private int mark; public Student(String student. ID, int mark) { this. student. ID = student. ID; this. mark = mark; } public String get. Student. ID() { return student. ID; } public int get. Mark() { return mark; } } A skeleton version of a possible Student class in a student records system

8 A collection of students • We consider a class list being stored as

8 A collection of students • We consider a class list being stored as an Array. List • The question we consider is how to retrieve the data for a student with a given student number • So we will write a method with the following signature public Student find. Student(Array. List<Student> classlist, String id) The method returns a (reference to a) Student object The arraylist of students is a parameter The student ID we want is the other parameter

9 Linear search public Student find. Student(Array. List<Student> classlist, String id) { for (Student

9 Linear search public Student find. Student(Array. List<Student> classlist, String id) { for (Student s : classlist ) if (s. get. Student. ID(). equals(id)) return s; return null; }

10 Comments • If the arraylist does contain the desired value, the method returns

10 Comments • If the arraylist does contain the desired value, the method returns the object as soon as it is found • If the arraylist does not contain the desired value, the method returns null after checking every element without success • We have shown the general situation of finding an object in a collection of objects

11 Performance of linear search • How fast does linear search work on an

11 Performance of linear search • How fast does linear search work on an collection of n items? • We can identify three situations • Best case, when the input is the most convenient possible • Worst case, when the input is the least convenient possible • Average case, averaged over all the inputs • In the best case, linear search finds the item at the first position of the array, so it needs 1 comparison • In the worst case, linear search does not find the item, so it performs n comparisons unsuccessfully • To calculate the average case performance, we would need some problem-specific assumptions about the input data

12 Linear search is too slow • If we have very large amounts of

12 Linear search is too slow • If we have very large amounts of data, then linear search is not feasible • For example, we can view a telephone directory as a very large array of objects, with each object consisting of a name and a number • If you are asked to find out which person has phone number 9388 6105, how long would it take you to do this by linear search? • However, if I ask you to find out the phone number of a specific person, then you can do it much, much faster • How do you do it? • How can we program a computer to do this?

13 Sorted collections • The reason that Name Phone number is quick, while Phone

13 Sorted collections • The reason that Name Phone number is quick, while Phone number Name is slow, is because the collection (i. e. the phone book) is sorted into alphabetical order, and somehow this allows us to find an entry much more quickly (we will see why later) • Most useful databases are sorted – dictionaries, indexes, etc.

14 Sorting • Before we examine how to efficiently search in a sorted collection,

14 Sorting • Before we examine how to efficiently search in a sorted collection, we consider how to sort the collection • We again start with the “plain vanilla” example – sorting an array of integers into increasing order before 6 8 1 15 12 2 7 after 1 2 4 6 12 15 7 8 4 • Later we will extend this to sorting arrays of objects according to various other criteria (alphabetical, etc. )

15 The basic set up • We will implement a number of sorting methods,

15 The basic set up • We will implement a number of sorting methods, all of which operate on an array of integers • We will develop these as a utility class called Sorter – a class with no instance variables, but just static methods (cf. Math) • Each method will have a similar signature, where the only thing that will vary is the name of the sorting technique public static void name. Sort(int[] a) • Each method receives an array as a parameter, and will sort that array “in place” • i. e. the method returns nothing, but a gets updated

16 bubble. Sort • The idea behind bubble. Sort is to systematically compare pairs

16 bubble. Sort • The idea behind bubble. Sort is to systematically compare pairs of elements, exchanging them if they are out of order • If the array contains n elements, then we view the algorithm as consisting of n– 1 “passes” • In the first pass we compare • Element 0 with element 1, exchange if necessary • Element 1 with element 2, exchange if necessary • … • Element n-2 with element n-1, exchange if necessary

17 The first pass • After the first pass, the largest element will be

17 The first pass • After the first pass, the largest element will be at the end 6 8 1 15 12 2 7 4 6 1 8 12 15 2 7 4 6 1 8 12 2 15 4 6 1 8 12 2 7 15 7 4

18 The second pass • The second pass doesn’t need to make the last

18 The second pass • The second pass doesn’t need to make the last comparison 6 1 8 12 2 7 4 15 1 6 8 2 12 7 4 15 1 6 8 2 7 12 15 4 1 6 8 2 7 12 4 15

19 The third pass • The third pass can omit the last two comparisons

19 The third pass • The third pass can omit the last two comparisons 1 6 8 2 7 4 12 15 1 6 2 8 7 4 12 15 1 6 2 7 8 4 12 15 1 6 2 7 4 8 12 15

20 The fourth pass • The fourth pass is even shorter 1 6 2

20 The fourth pass • The fourth pass is even shorter 1 6 2 7 4 8 12 15 1 2 6 4 7 8 12 15

21 The fifth and sixth passes 1 2 6 4 7 8 12 15

21 The fifth and sixth passes 1 2 6 4 7 8 12 15 1 2 4 6 7 8 12 15

22 Why does it work? • We need to have some argument or “proof”

22 Why does it work? • We need to have some argument or “proof” that this works • We claim that After i passes, the largest i elements in the array are in their correct positions • This is true after the first pass, because the largest element in the array is encountered at some stage and then “swapped all the way to the end” of the array • The same argument – applied to the remainder of the array – shows that the second pass puts the second largest element into place; repeating this argument n times gives the result

23 Coding bubblesort public static void bubble. Sort(int[] a) { for (int pass =

23 Coding bubblesort public static void bubble. Sort(int[] a) { for (int pass = 1; pass < a. length; pass++) for (int j = 0; j < a. length-pass; j++) if (a[j] > a[j+1]) swap(a, j, j+1); }

24 Sorting students public static void bubble. Sort(Student[] a) { for (int pass =

24 Sorting students public static void bubble. Sort(Student[] a) { for (int pass = 1; pass < a. length; pass++) for (int j = 0; j < a. length-pass; j++) if (/* a[j] and a[j+1] out of order */) swap(a, j, j+1); } Almost identical code, except that we need to get the right boolean condition to check when two students are in the “wrong order”

25 What order do we want? • The precise form of the statement depends

25 What order do we want? • The precise form of the statement depends on whether we want to sort students: • Alphabetically according to their student. Id • Numerically according to their mark • In addition, the desired sort could be ascending (smaller values first) or descending (smaller values last) • Suppose that we want to sort the students into normal (ascending) alphabetical order by student. Id

26 For alphabetic order • The comparison between the two Student objects a[j] and

26 For alphabetic order • The comparison between the two Student objects a[j] and a[j+1] first needs to obtain the two ids to compare, so it will involve the two Strings • String s 1 = a[j]. get. Student. ID(); • String s 2 = a[j+1]. get. Student. ID(); • To compare two Strings we use the compare. To method if (s 1. compare. To(s 2) > 0) { // Swap the two Students }

27 Selection sort • When sorting n items, Selection Sort works as follows •

27 Selection sort • When sorting n items, Selection Sort works as follows • The procedure has n– 1 stages • Select the smallest element in the array, and swap it with the element in position 0 • Then select the smallest element in the array starting from position 1, and swap it with the element in position 1 • Then select the smallest element in the array starting from position 2, and swap it with the element in position 2 • Etc. • This algorithm has the following properties • After i stages, the first i items in the array are the i smallest items, in order • At the (i+1)th stage, the (i+1)th smallest item is placed in the (i+1)th slot in the array

28 Coding selection. Sort public static void selection. Sort(int[] a) { for (int pass

28 Coding selection. Sort public static void selection. Sort(int[] a) { for (int pass = 0; pass < a. length – 1; pass++) { int smallest = pass; for (int j = pass + 1; j < a. length; j++) if (a[j] < a[smallest]) smallest = j; swap(a, smallest, pass); } }

29 Insertion Sort (like sorting cards) • In card games, it is common to

29 Insertion Sort (like sorting cards) • In card games, it is common to pick up your cards as they are dealt, and to sort them into order as they arrive • For example, suppose your first three cards are • Next you pick up a 9 of clubs

30 Inserting a card • The new card is then inserted into the correct

30 Inserting a card • The new card is then inserted into the correct position

31 Insertion sort • We can develop this idea into an algorithm called Insertion

31 Insertion sort • We can develop this idea into an algorithm called Insertion Sort • When sorting n items, Insertion Sort works as follows • The procedure has n– 1 stages • Compare the second item in the array with the first item; make them ordered • Compare third item in the array with the first two items; make them ordered • Etc. • This algorithm has the following properties • After i stages, the first i+1 items are sorted although they aren’t the smallest • At the (i+1)th stage, the item originally in position i+2 is placed in its correct position relative to the first i+1 items

32 Example • Initial array 6 8 1 15 12 2 7 4 •

32 Example • Initial array 6 8 1 15 12 2 7 4 • Stage 0: Move the first element into position (do nothing) 6 8 1 15 12 2 7 4 • Stage 1: Examine the second element and insert it into position (again do nothing) 6 8 1 15 12 2 7 4

33 Stage 2 6 8 1 15 12 2 7 4 • This element

33 Stage 2 6 8 1 15 12 2 7 4 • This element is out of position, so it will have to be inserted 6 8 15 12 2 7 4 1 6 8 15 12 2 7 4 6 1

34 Stages 3 & 4 Stage 3 Stage 4 1 6 8 15 12

34 Stages 3 & 4 Stage 3 Stage 4 1 6 8 15 12 2 7 4 1 6 8 15 2 7 4 12 1 6 8 12 15 2 7 4

35 Stage 5 1 6 8 12 15 2 7 4 1 6 8

35 Stage 5 1 6 8 12 15 2 7 4 1 6 8 12 15 7 4 2 6 8 12 15 7 4 2 6 8 12 15 7 4 1 1 2

36 Stage 6 1 2 6 8 12 15 7 4 1 2 6

36 Stage 6 1 2 6 8 12 15 7 4 1 2 6 8 12 15 4 7 1 2 6 8 12 15 4 7

37 Final stage 1 2 6 7 8 12 15 4 1 2 6

37 Final stage 1 2 6 7 8 12 15 4 1 2 6 7 8 12 15 1 2 6 7 8 12 1 2 6 7 8 1 2 6 7 1 2 6 1 2 4 4 15 4 12 15 4 8 12 15 4 7 8 12 15 4 6 7 8 12 15

38 Code for insertion. Sort public static void insertion. Sort(int[] a) { for (int

38 Code for insertion. Sort public static void insertion. Sort(int[] a) { for (int pass = 1; pass < a. length; pass++) { int tmp = a[pass]; // new element to insert int pos = pass - 1; // move out-of-order elements up to make space while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } // insert the new element in the right place a[pos+1] = tmp; } }

39 Code dissection public static void insertion. Sort(int[] a) { for (int pass =

39 Code dissection public static void insertion. Sort(int[] a) { for (int pass = 1; pass < a. length; pass++) { int tmp = a[pass]; int pos = pass-1; while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } a[pos+1] = tmp; } } The body of the for-loop contains the code for one stage or “pass” of the algorithm

40 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a.

40 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a. length; pass++) { int tmp = a[pass]; int pos = pass-1; while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } a[pos+1] = tmp; } } The variable tmp stores the value that is to be inserted; the variable pos will eventually indicate the position where it should be inserted

41 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a.

41 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a. length; pass++) { int tmp = a[pass]; int pos = pass-1; while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } a[pos+1] = tmp; } } This code does the work of shifting each element in turn one space along if it is bigger than the value to be inserted. We also need to ensure that we don’t fall off the left-hand end of the array!

42 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a.

42 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a. length; pass++) { int tmp = a[pass]; int pos = pass-1; while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } a[pos+1] = tmp; } } The while loop finishes when we have found the correct position for a[pass], so it is now inserted into this position

43 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a.

43 Code dissection public static void insertion. Sort(int[] a) { for (int pass=1; pass<a. length; pass++) { int tmp = a[pass]; int pos = pass-1; while (pos >= 0 && a[pos] > tmp) { a[pos+1] = a[pos]; pos--; } a[pos+1] = tmp; } } Note that if a[pass] is already in the correct spot, the while loop does nothing and a[pass] goes back into the same place

EFFICIENCY & SORTING II CITS 1001

EFFICIENCY & SORTING II CITS 1001

45 Scope of this lecture • Quicksort and mergesort • Performance comparison • Binary

45 Scope of this lecture • Quicksort and mergesort • Performance comparison • Binary search

46 Recursive sorting • All of the algorithms so far build up the “sorted

46 Recursive sorting • All of the algorithms so far build up the “sorted part” of the array one element at a time • What if we take a completely different approach? • Faster algorithms split the elements to be sorted into groups, sort the groups separately, then combine the results • There are two principal approaches • “Intelligent” splitting and “simple” combining • Simple splitting and intelligent combining • These are divide-and-conquer algorithms

47 Quicksort • When sorting n items, Quick Sort works as follows • Choose

47 Quicksort • When sorting n items, Quick Sort works as follows • Choose one of the items p to be the pivot • Partition the items into L (items smaller than p) and U (items larger than p) • L’ = sort(L) • U’ = sort(U) • The sorted array is then L’ + p + U’ , in that order • Intelligent splitting, and simple combining

48 Behaviour of quicksort 6 8 1 7 12 9 2 15 Items smaller

48 Behaviour of quicksort 6 8 1 7 12 9 2 15 Items smaller than the pivot 6 1 Items larger than the pivot 2 8 Choose a pivot (7) Sort 1 15 12 2 Sort 6 8 Append 1 2 6 7 8 9 9 12 15

49 Second level 12 9 8 15 Items smaller than the pivot Items larger

49 Second level 12 9 8 15 Items smaller than the pivot Items larger than the pivot 8 15 12 Choose a pivot (9) Sort 12 15 Append 8 9 12 15

50 Code for quick. Sort public static void quick. Sort(int[] a) { qsort(a, 0,

50 Code for quick. Sort public static void quick. Sort(int[] a) { qsort(a, 0, a. length – 1); } // sort a[l. . u] inclusive private static void qsort(int[] a, int low, int high) { if (low < high) { int p = partition(a, low, high); qsort(a, low, p – 1); qsort(a, p + 1, high); } } • What if low == high?

51 Code for partition private static int partition(int[] a, int low, int high) {

51 Code for partition private static int partition(int[] a, int low, int high) { // this code always uses the last element a[high] as the pivot int si = low; for (int i = low; i < high; i++) { if (a[i] <= a[high]) { // swap small elements to the front swap(a, i, si++); } } //swap the pivot to be between smalls and larges swap(a, si, u); return si; }

52 Behaviour of partition si 6 a[u] 8 1 15 12 2 9 si

52 Behaviour of partition si 6 a[u] 8 1 15 12 2 9 si a[0] < a[u], so a[0] ↔ a[si] and si++ 6 8 a[u] 1 15 12 2 9 si a[2] < a[u], so a[2] ↔ a[si] and si++ 6 1 8 6 1 2 15 12 2 9 6 1 2 15 12 7 7 a[u] 8 9 si a[7] ↔ a[si], return si 7 a[u] si a[5] < a[u], so a[5] ↔ a[si] and si++ 7 7 a[u] 12 8 9 15

53 Or … partition around middle element private static void qsort. M(int[] a, int

53 Or … partition around middle element private static void qsort. M(int[] a, int l, int u) { int pivot. Value = a[(l+u)/2]; //pivot is middle of the array int i=l; //process low partition from the left int j=u; //process high partition from the right while (i<=j) { //until all elements are in position while (a[i] < pivot. Value) { i++; } //skip over in place elements while (a[j] > pivot. Value) { j--; } //skip over in place elements if (i<=j) { swap(a, i++, j--); //swap out of place elements } } if (l<j) { qsort(a, l, j); } // qsort lower half if (i<u) { qsort(a, i, u); } // qsort upper half }

54 Mergesort • When sorting n items, Merge Sort works as follows • Let

54 Mergesort • When sorting n items, Merge Sort works as follows • Let F be the front half of the array, and B be the back half • F’ = sort(F) • B’ = sort(B) • Merge F’ and B’ to get the sorted list – repeatedly compare their first elements and take the smaller one • Simple splitting, and intelligent combining

55 Behaviour of mergesort 6 8 1 15 12 2 9 7 Front half

55 Behaviour of mergesort 6 8 1 15 12 2 9 7 Front half 6 8 1 Back half 15 12 2 9 Sort 1 7 Sort 6 8 15 2 Merge 1 2 6 7 8 9 12 15 7 9 12

56 Second level 12 2 9 7 Front half 12 Back half 2 9

56 Second level 12 2 9 7 Front half 12 Back half 2 9 7 Sort 2 12 7 Merge 2 7 9 12 9

57 Code for merge. Sort public static void merge. Sort(int[] a){ msort(a, 0, a.

57 Code for merge. Sort public static void merge. Sort(int[] a){ msort(a, 0, a. length - 1); } // sort a[l. . u] inclusive private static void msort(int[] a, int l, int u){ if (l < u) {int m = (l + u) / 2; msort(a, l, m); msort(a, m + 1, u); merge(a, l, m, u); } } • Again, if l == u, there is only one element: no sorting is needed

58 Code for merge // merge a[l. . m] with a[m+1. . u] private

58 Code for merge // merge a[l. . m] with a[m+1. . u] private static void merge(int[] a, int l, int m, int u) { while (l <= m && a[l] <= a[m + 1]) l++; if (l <= m) // small elements on the 1 st list needn't be moved // if the 1 st list is exhausted, we're done { while (u >= m + 1 && a[u] >= a[m]) u--; // large elements on the 2 nd list needn't be moved int start = l; // record the start and finish points of the 1 st list int finish = m++; int[] b = new int[u - l + 1]; // this is where we will put the sorted list int z = 0; while (m <= u) // while the 2 nd list is alive, copy the smallest element to b if (a[l] <= a[m]) b[z++] = a[l++]; else b[z++] = a[m++]; while (z < b. length) b[z++] = a[l++]; // copy the rest of the 1 st list for (int i = 0; i < b. length; i++) a[start + i] = b[i]; // copy the sorted list back from b } }

59 Efficiency experiment • Is there any difference between the performance of all these

59 Efficiency experiment • Is there any difference between the performance of all these sorting algorithms? • After all they all achieve the same result… • Which one(s) are more efficient? • Why? • Experiment: use the provided Sorter class to estimate the execution time of each algorithm for sorting a large, disordered array • Graph your results

60 Performance Comparison Algorithm Time to sort (ms) 1, 000 items 100, 000 items

60 Performance Comparison Algorithm Time to sort (ms) 1, 000 items 100, 000 items 1, 000 items Bubble 1 192 19, 295 1, 822, 273 Selection 0 47 4, 220 396, 780 Insertion 0 21 954 111, 746 Merge 0 2 13 166 Quick (E) 0 1 7 97 Quick (M) 0 1 8 88

61 Analysis • Why are quicksort and mergesort so much faster? • The first

61 Analysis • Why are quicksort and mergesort so much faster? • The first three algorithms all reduce the number of items to be sorted by one in each pass • And each pass takes linear time • Therefore their overall run-time is n 2, i. e. quadratic • Multiplying the number of items by 10 multiplies run-time by 102 = 100 • Quicksort and mergesort reduce the number of items by half at each level • And each level takes linear time • Therefore their overall run-time is nlog 2 n • Multiplying the number of items by 10 multiplies run-time by 10 and a bit

62 A note on the accuracy of such tests • Assessing the execution time

62 A note on the accuracy of such tests • Assessing the execution time of Java code this way is not completely accurate • You will not always get the same results • Activities such as garbage collection may affect the times • Or just if your computer is running other applications concurrently • We “average out” unrepresentative examples by using • Random data • Multiple runs

63 (Finally) back to searching • Linear search was inefficient for large datasets •

63 (Finally) back to searching • Linear search was inefficient for large datasets • But if we can assume that our data is sorted, we can use a much more efficient approach • reminiscent of quick. Sort and merge. Sort • When looking for an item x, compare x with the middle item • if x equals the middle item, we’re done • if x is bigger than the middle item, search the top half of the data • if x is smaller than the middle item, search the bottom half of the data • In each step, we delete half of the data from consideration!

64 Code for binary search public static int binary. Search(int[] a, int x) {

64 Code for binary search public static int binary. Search(int[] a, int x) { int l = 0; int u = a. length - 1; // we know that x can only be in the interval l…u while (l <= u) // have we failed yet? { int m = (l + u) / 2; // m is the middle index if (a[m] == x) return m; else if (a[m] < x) l = m + 1; // restrict the search to m+1…u else // a[m] > x u = m - 1; // restrict the search to l…m-1 } return -1; }

65 Summary • We study sorting algorithms because they provide good examples of many

65 Summary • We study sorting algorithms because they provide good examples of many of the features that affect the run-time of program code. • When checking the efficiency of your own code, consider • Number of loops, and depth of nesting • Number of comparison operations • Number of swap (or similar) operations