Merge Sort Algorithm r Merge sort has two

  • Slides: 11
Download presentation
Merge Sort Algorithm r Merge sort has two phases. l First it divides the

Merge Sort Algorithm r Merge sort has two phases. l First it divides the data into smaller and smaller lists until they are size 2 or 1 l Second as it returns it merges all the lists using a merge algorithm r Consider the following data set. – 78 45 34 20 18 15 96 10 l The first two sub lists for merging will be – 45 78 – 20 34 l After merge – 20 34 45 78 l The next two sub-lists are – 15 18 – 10 96 l After merge – 10 15 18 96 l Final merge yields – 10 15 18 20 34 45 78 96 Computer Science I - Martin Hardwick Lecture 1 -- 1

Merge vector <int> merge( vector <int>list 1, vector <int>list 2) // Merge two sorted

Merge vector <int> merge( vector <int>list 1, vector <int>list 2) // Merge two sorted lists { } vector <int> result; int i 1 = 0; int i 2 = 0; while (i 1 < list 1. size() || i 2 < list 2. size()) { if (i 1 < list 1. size() && i 2 < list 2. size()) { if (list 1[i 1] < list 2[i 2]) result. push_back (list 1[i 1++]); else result. push_back (list 2[i 2++]); } else { while (i 1 < list 1. size()) result. push_back (list 1[i 1++]); while (i 2 < list 2. size()) result. push_back (list 2[i 2++]); } } return result; Computer Science I - Martin Hardwick r This algorithm picks the smallest item from each list. r When it reaches the end of one list it then fills in the remaining items from the other list. r Consider l List 1: 18 23 45 78 l List 2: 19 21 80 90 r List 1 will be consumed first l Result: 18 19 21 23 45 78 l I 1 = 4 l I 2 = 2 r We then add 80 and 90 from list 2 l Result: 18 19 21 23 45 78 80 90 l I 1 = 4 l I 2 = 4 Lecture 1 -- 2

Data management issues r The given algorithm is not very efficient. l It adds

Data management issues r The given algorithm is not very efficient. l It adds too many items to too many vectors using push_back. l The system may run out of space and have to garbage collect r A more efficient approach is to define the space needed at the beginning of the program l Either – Create a vector with a specific size vector <int> result (10000); – Use ordinary arrays with enough size int result [1000]; r Another issue for the largest test (128, 000) items is running out of memory in your program. To fix this: l Goto Project/Properties l Select Linker/System l Set the stack sizes to 1, 000, 000 Computer Science I - Martin Hardwick Lecture 1 -- 3

Memory management Call stack Frame Call stack Frame r Static area (not shown) l

Memory management Call stack Frame Call stack Frame r Static area (not shown) l This area is of fixed size l It stores the program code and any static (global) variables top gap Computer Science I - Martin Hardwick r The memory is divided into three areas gap r Stack area l This area stores the stack frames of all the currently executing functions l It needs to be big enough for all of the local data for all the functions r Heap area l This area stores all the data whose size cannot be predicted Lecture 1 -- 4

Memory management r If the heap grows too large then it will collide with

Memory management r If the heap grows too large then it will collide with the stack and the system will run out of memory r The heap contains random items of random size some of which are no longer used l A garbage collector can go and find these unused locations (pass 1) l Compress all the space by squeezing out the gaps (pass 2) l Leaving new space at the top of the heap r However, running the garbage collector is very expensive l We can help by not wasting memory l By telling the system when something is going to grow big vector <int> result (10000) l So that the system does not waste space by first creating a small vector, then a middle size copy, then a large, then a very large, then an enormous copy. Computer Science I - Martin Hardwick Lecture 1 -- 5

Memory management Vector <int> V 1 Header elements Vector <int> V 2 Header elements

Memory management Vector <int> V 1 Header elements Vector <int> V 2 Header elements r A vector object is divided into two components r The header containing fixed size information l Current number of elements l Pointer to data elements l Stored on the stack Data elements for v 1 gap Computer Science I - Martin Hardwick top gap r Data elements l The data items in sequence stored in the heap r More on this CS 2 l How to make and use pointers l How to get your own data on the heap. Lecture 1 -- 6

Merge made more efficient vector <int> merge( vector <int>list 1, vector <int>list 2) //

Merge made more efficient vector <int> merge( vector <int>list 1, vector <int>list 2) // Merge two sorted lists { vector <int> result (list 1. size() + list 2. size()); int i 1 = 0; int i 2 = 0; int resi = 0; while (i 1 < list 1. size() || i 2 < list 2. size()) { if (i 1 < list 1. size() && i 2 < list 2. size()) { if (list 1[i 1] < list 2[i 2]) result[resi++] = list 1[i 1++]; else result[resi++] = list 2[i 2++]; } else { while (i 1 < list 1. size()) result[resi++] = list 1[i 1++]; r This algorithm sets a size for the new list. r Therefore, it does not need to use push_back r Still slow compared to array solution however. r Arrays always use a big block of contiguous memory. r Paging is not such an issue. r (Paging occurs when the OS has to get data for your program from the disk) while (i 2 < list 2. size()) result[resi++] = list 2[i 2++]; } } return result; } Computer Science I - Martin Hardwick Lecture 1 -- 7

Operator Overloading bool operator >(acct a, acct b) { } return a. get_num() >

Operator Overloading bool operator >(acct a, acct b) { } return a. get_num() > b. get_num(); ostream& { os << return } operator<< (ostream &s, acct a) “Name: “ << a. get_name(); “ Balance: “ << a. get_bal(); os; acct operator + (acct a, acct b) { return acct (a. get_num(), a. get_name(), a. get_bal() + b. get_bal()); } Computer Science I - Martin Hardwick r Remember the bank account example. r We can enrich this example by using operator overloading r The code on the left defines l The meaning of > for bank accounts l A special version of << for bank accounts l A plus function for bank accounts that returns the value of a with b’s balance added Lecture 1 -- 8

Operator overloading and sort class acct { // bank account data private: int num;

Operator overloading and sort class acct { // bank account data private: int num; // account number string name; // owner of account double balance; // balance in account r There is a sort function in <algorithm> that can be used to sort a vector of any type of data. r All we have to do is define the “>” and “<“ operators for this data public: acct (); acct (int anum, string aname, double abal); double get_bal (); string get_name(); int get_num (); void put_num (int num); void put_name (string name); void put_bal (double bal); bool is_bankrupt(); bool operator<(acct b); bool operator>(acct b); }; Computer Science I - Martin Hardwick Lecture 1 -- 9

Implementation bool acct: : operator< (acct b) { return get_name() < b. get_name(); }

Implementation bool acct: : operator< (acct b) { return get_name() < b. get_name(); } bool acct: : operator> (acct b) { return get_name() > b. get_name(); } r In the implementation we give a meaning to the > and < operators. r In this case we are defining them using the alphabetic order of the names. r These are object functions so get_name() returns the name of this object. Computer Science I - Martin Hardwick Lecture 1 -- 10

Usage else if (command == "sort") { // LIST COMMAND for (loc=0; loc <

Usage else if (command == "sort") { // LIST COMMAND for (loc=0; loc < my_bank. size () - 1; loc++) { if (my_bank. get (loc) > my_bank. get (loc + 1)) cout << "Bank is out of order at location " << loc << endl; } vector <acct> temp = my_bank. get_all(); sort (temp. begin(), temp. end()); my_bank. put_all (temp); cout << “Sorted List of accounts: " << endl; for (loc=0; loc < my_bank. size (); loc++) { account = my_bank. get (loc); cout << "Account: " << account. get_num() << “t. Owner: " << account. get_name() r You must include <algorithm> at the top of your program. r This code tests for unsorted data. r Then it does a sort r Then it prints the new data r After the sort the accounts will be sorted by name! Not by number. l Do not try using the insert functionality in the bank account example after doing this sort l A better solution will put the sort into a member function on the bank << “t. Balance: " << account. get_bal() << endl; } } Computer Science I - Martin Hardwick Lecture 1 -- 11