External Sort ExternalMemory Sorting n Externalmemory algorithms n































- Slides: 31

External Sort

External-Memory Sorting n External-memory algorithms n n External-memory sorting n n When data do not fit in main-memory Rough idea: sort pieces that fit in main-memory and then “merge” them Main-memory merge sort: n The main part of the algorithm is Merge 2

Main-Memory Merge Sort Merge-Sort(A) 01 if length(A) > 1 then 02 Copy the first half of A into array A 1 03 Copy the second half of A into array A 2 04 Merge-Sort(A 1) 05 Merge-Sort(A 2) 06 Merge(A, A 1, A 2) n Divide Conquer Combine Running time for Merge sort: O(nlogn) 3






1. 2. 3. Read the 1 st 250 records of Run 1 from scratch disk into Buffer 1 Read the 1 st 250 records of Run 2 into Buffer 2 Merge buffers 1 and 2 into Buffer 3. As soon as buffer 3 gets full, - write it (250 records) on scratch disk, - empty buffer 3. - continue merging the remaining records left in buffers 1 and 2 Note that this process will terminate when buffer 3 gets full 2 times and 2 times writing it on scratch disk is carried on. 9

External Sort, 2 Way Merge n We repeat this procedure for Run 3 & 4, and Run 5 & 6. n At the end of this step we have the following arrangement. Scratch disk 1 Scratch disk 2 10

External Sort, 2 Way Merge This process of n n n n coping the 1 st 250 records from Run 1 (in scratch disk 1) into Buffer 1, and the 1 st 250 records from Run 2 (in scratch disk 1) into Buffer 2, Merge them into Buffer 3, When buffer 3 gets full, Write buffer 3 on Scratch disk 2 Empty Buffer 3 Continue merging the remaining records in buffers 1 and 2 into buffer 3. When buffer 3 gets full, Write buffer 3 on Scratch disk 2 Empty Buffer 3 Is continued until all Runs from scratch disk 1 in this level are merged into New Runs. Note that the number of Runs in each level is at most ½ of the number of Runs in its previous level. 11

External Sort, 2 Way Merge (1) (2) 12

External-Memory Merging n n n Twoway. Merge: uses three main-memory Buffers of size B Read the data of Run 1, into buffer 1, and data of Run 2 in to Buffer 2. Merge Buffer 1 and 2 into Buffer 3. When Buffer 3 is full, write it on Disk file X Empty Buffer 3. In the above process the size of merged run, is the same as input run. Read, when p 1 = B (p 2 = B) Bf 1 min(Bf 1[p 1], Bf 2[p 2]) p 1 po Bf 2 p 2 Current page Bfo Current page Write, when Bfo full File Y: EOF Run 1 Run 2 File X: Merged run 13

Time complexity analysis: Assumptions n Assumptions and notation: n Disk page size: B: The number of data elements (records) in one page of disk. n Data file size: N = n. B // n: Number of disk pages • n = N/B n Available main memory: • M elements, m = M/B pages 14

Time complexity analysis 8 M = N : total file size Phase 2 4 M 4 M 2 M Phase 1 n 2 M 2 M M M M M Phase 1: n n 2 M Read file X, write file Y: 2 n = O(n) I/Os n: No. of disk pages Phase 2: n n One iteration: Read file Y, write file X: 2 n = O(n) Number of iterations: log 2 N/M = log 2 n/m I/Os 15

Time complexity analysis : Conclusions n Total running time of 2 -way externalmemory merge sort: O(n log 2 n/m) n Can we obtain better running time ! 16

Time complexity analysis : Conclusions n Can we obtain better running time ! n We test the following: n n Phase 1 uses all available memory Phase 2 uses just 3 pages out of m available pages !!! 17

External Sort, K Way Merge 18



External Sort, K Way Merge n n In the following Figure, we assumed k=4, (k+1 = 5) If we assume the K way merge is done on m Runs, Then at most we will have levels Therefore it seems by increasing K we can decrease the overall running time. 21


Multiway Merging Bf 1 p 1 Bf 2 Read, when pi = B p 2 min(Bf 1[p 1], Bf 2[p 2], …, Bfk[pk]) Bfo po Bfk pk Current page Write, when Bfo full Current page File Y: Run 1 Run 2 Run k=n/m EOF File X: Merged run 23

Multiway, (k way) Merging n Here we assume we have k+1 buffers (buffer 0, 1, . . , k) in the main memory each having the size of n/m elements. n We sort the data in each buffers 1~K, and Repeatedly find the min elements in these k buffers and put it in buffer 0. n Any time that buffer 0 gets filed in, we write it on the end of file X and empty buffer 0, for the next bulk of sorted data. n This process is repeated until all sorted data is written in file X. 24



General Multiway Merge Sort n What if a file is very large or memory is small? n General multiway merge sort: n Phase 1: the same (do internal sorts) n Phase 2: do as many iterations of merging as necessary until only one run remains Each iteration repeatedly calls Multiway. Merge(Y, X) to merge groups of m-1 runs until the end of file Y is reached. 27

Analysis (m-1)3 M = N Phase 2 (m-1)M Phase 1 n n . . . (m-1)M M M … M . . . M M … M Phase 1: O(n), each iteration of phase 2: O(n) How many iterations are there in phase 2? n n . . . (m-1)2 M Number of iterations in phase 2: logm-1 N/M = logmn Total running time: O(n logm n) I/Os 28

Conclusions B: number of records in one page of disk. n: Number of disk pages M: available main memory (in number of records), m = M/B pages n Total running time of 2 -way external-memory merge sort: O(n log 2 n/m) n External sorting can be done in O(n logm n) I/O operations for any n n This is asymptotically optimal 29

End of sorting algorithms 30

31
Internal vs external sorting
N
10 sorting algorithms
Lower bound for comparison based sorting algorithms
Efficiency of sorting algorithms
Sorting algorithms with examples
Sorting algorithms in c
Most common sorting algorithms
Recursive sorting algorithms
Complexity of algorithm
Introduction to sorting algorithms
Quadratic sorting algorithms
Sort meaning
External sorting techniques
Difference between insertion sort and bubble sort
Bubble sort vs selection sort
Heap sort vs selection sort
Radix sort vs bucket sort
Selection sort vs bubble sort
Differentiate between bubble and quick sorting
Radix bucket sort
Difference between bubble sort and selection sort
Quick sort merge sort
Bubble sort 5-66
Quick sort merge sort
Internal and external sort
External-external trips
Internal sorting
Introduction to sorting
Non deterministic algorithm for sorting
Struktur data searching
Tim sorting