External Sorting Adapt fastest internalsort methods Quick sort

























- Slides: 25

External Sorting • Adapt fastest internal-sort methods. ü Quick sort …best average run time. • Merge sort … best worst-case run time.

Internal Merge Sort Review • Phase 1 § Create initial sorted segments • Natural segments • Insertion sort • Phase 2 § Merge pairs of sorted segments, in merge passes, until only 1 segment remains.

External Merge Sort • • Sort 10, 000 records. Enough memory for 500 records. Block size is 100 records. t. IO = time to input/output 1 block (includes seek, latency, and transmission times) • t. IS = time to internally sort 1 memory load • t. IM = time to internally merge 1 block load

External Merge Sort • Two phases. § Run generation. ØA run is a sorted sequence of records. § Run merging.

Run Generation 10, 000 records 100 blocks MEMORY 500 records 5 blocks • • Input 5 blocks. Sort. Output as a run. Do 20 times. DISK • • 5 t. IO t. IS 5 t. IO 200 t. IO + 20 t. IS

Run Merging • Merge Pass. § Pairwise merge the 20 runs into 10. § In a merge pass all runs (except possibly one) are pairwise merged. • Perform 4 more merge passes, reducing the number of runs to 1.

Merge 20 Runs R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 R 16 R 17 R 18 R 19 R 20 S 1 S 2 T 1 S 3 S 4 S 5 T 2 S 6 S 7 T 4 T 3 S 10 S 9 T 5 U 3 U 2 U 1 S 8 V 2 V 1 W 1

Merge R 1 and R 2 Output Input 0 • • Input 1 DISK Fill I 0 (Input 0) from R 1 and I 1 from R 2. Merge from I 0 and I 1 to output buffer. Write whenever output buffer full. Read whenever input buffer empty.

Time To Merge R 1 and R 2 • • • Each is 5 blocks long. Input time = 10 t. IO. Write/output time = 10 t. IO. Merge time = 10 t. IM. Total time = 20 t. IO + 10 t. IM.

Time For Pass 1 (R • Time to merge one pair of runs = 20 t. IO + 10 t. IM. • Time to merge all 10 pairs of runs = 200 t. IO + 100 t. IM. S)

Time To Merge S 1 and S 2 • • • Each is 10 blocks long. Input time = 20 t. IO. Write/output time = 20 t. IO. Merge time = 20 t. IM. Total time = 40 t. IO + 20 t. IM.

Time For Pass 2 (S • Time to merge one pair of runs = 40 t. IO + 20 t. IM. • Time to merge all 5 pairs of runs = 200 t. IO + 100 t. IM. T)

Time For One Merge Pass • • Time to input all blocks = 100 t. IO. Time to output all blocks = 100 t. IO. Time to merge all blocks = 100 t. IM. Total time for a merge pass = 200 t. IO + 100 t. IM.

Total Run-Merging Time • (time for one merge pass) * (number of passes) = (time for one merge pass) * ceil(log 2(number of initial runs)) = (200 t. IO + 100 t. IM) * ceil(log 2(20)) = (200 t. IO + 100 t. IM) * 5

Factors In Overall Run Time • Run generation. 200 t. IO + 20 t. IS § Internal sort time. § Input and output time. • Run merging. (200 t. IO + 100 t. IM) * ceil(log 2(20)) § § Internal merge time. Input and output time. Number of initial runs. Merge order (number of merge passes is determined by number of runs and merge order)

Improve Run Generation • Overlap input, output, and internal sorting. DISK MEMORY DISK

Improve Run Generation • Generate runs whose length (on average) exceeds memory size. • Equivalent to reducing number of runs generated.

Improve Run Merging • Overlap input, output, and internal merging. DISK MEMORY DISK

Improve Run Merging • Reduce number of merge passes. § Use higher-order merge. § Number of passes = ceil(logk(number of initial runs)) where k is the merge order.

Merge 20 Runs Using 5 -Way Merging R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 R 13 R 14 R 15 R 16 R 17 R 18 R 19 R 20 S 1 S 2 S 3 T 1 Number of passes = 2 S 4

I/O Time Per Merge Pass • Number of input buffers needed is linear in merge order k. • Since memory size is fixed, block size decreases as k increases (after a certain k). • So, number of blocks increases. • So, number of seek and latency delays per pass increases.

I/O Time Per Merge Pass I/O time per pass merge order k

Total I/O Time To Merge Runs • (I/O time for one merge pass) * ceil(logk(number of initial runs)) Total I/O time to merge runs merge order k

Internal Merge Time O R 1 R 2 R 3 R 4 R 5 R 6 • Naïve way => k – 1 compares to determine next record to move to the output buffer. • Time to merge n records is c(k – 1)n, where c is a constant. • Merge time per pass is c(k – 1)n. • Total merge time is c(k – 1)nlogkr ~ cn(k/log 2 k) log 2 r.

Merge Time Using A Tournament Tree O R 1 R 2 R 3 R 4 R 5 R 6 • Time to merge n records is dnlog 2 k, where d is a constant. • Merge time per pass is dnlog 2 k. • Total merge time is (dnlog 2 k) logkr = dnlog 2 r.
Internal and external sort
Quick sort merge sort
Quick sort merge sort
Heap sort vs quick sort
Compare between bubble sort and selection sort.
Lesson 1: analyzing a graph
Quick find algorithm
1.7.6 - quick check: frost quick check
External sorting algorithms
Examples of external sorting
Is quick sort in place
Radix sort clasificacion
Quick select visualization
Quick sort worst complexity
Worst case quicksort
Quick sort
Quick sort
How to do quick sort
Quicksort linked list
Slowest sorting algorithm
Quick sort algorithm
Quick sort algorithm with example
Rahul sehgal md
Loop invariant of quick sort
Quick sort
Quick sort algoritma