Sorting Really Big Files Sorting Part 3

Using K Temporary Files n Given ¨ N records in file F ¨ M records will fit into internal memory ¨ Use K temp files, where K = N / M n Create K sorted files from F, then merge them n Problems ¨ computers compare 2 values at once, not K values ¨ merging only 2 of K runs at once creates LOTS of temp files ¨ in the illustration on the next page, notice that we soon begin merging small runs with big temp files n too many comparisons

What would these trees look like with 8 runs? Alternative Merging Strategy F R 1 F R 2 T 1 R 3 T 2 R 3 T 1 R 4 T 2 R 5 S 1 empty R 5 T 3 R 4 T 3 R 1 = Run 1 R 2 = Run 2 etc R 2 S 2

N-Way Merge n We can create that tree using just 4 temp files ¨ n 2 are input and 2 are output, the pairs alternate being input and output files Algorithm Write. . . Merge. . . Run Run 1 2 3 4 into T 1 T 2 first runs in T 1 and T 2 into T 3 second runs in T 1 and T 2 into T 4 thirds runs in T 1 and T 2 into T 3 first runs in T 3 and T 4 into T 1 second runs in T 3 and T 4 into T 2

N-Way Merge Step Number F T 1 T 3 T 2 T 4 Files Contain Runs 1 T 1 - R 1 T 2 - R 2 T 3 T 4 - R 3 R 4 2 T 1 T 2 T 3 - R 1 -R 2 T 4 - R 3 -R 4 3 T 1 T 2 T 3 T 4 - 4 T 1 T 2 T 3 - R 1 -R 8 T 4 - R 9 -R 10 5 T 1 T 2 T 3 T 4 - R 1 -R 4 R 5 -R 8 R 1 -R 10 R 5 R 6 R 7 R 8 R 5 -R 6 R 7 -R 8 R 9 -R 10 R 9 -10

Analysis n Number of Comparisons: ¨ N-Way Merge -- O (n log 2 n) ¨ K Temp Files -- O ( n 2 ) n Disk Space n Could the run size be one record? ¨ In other words, is the internal sort necessary?