Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth

  • Slides: 24
Download presentation
Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science

Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science

What Do We Know About Columnsort? • Sorts N values on an r ´

What Do We Know About Columnsort? • Sorts N values on an r ´ s mesh • Uses 8 steps – Each step either sorts each column or performs a fixed permutation • Divisibility restriction: s divides r • Height restriction: r ≥ 2 s 2 4 s 3/2 – – Exponent of s goes from 2 to 3/2 Mesh need not be quite so tall and skinny Cost: 2 additional steps Can simultaneously remove the divisibility restriction and relax the height restriction to r ≥ 6 s 3/2

Why Relax the Conditions? • Columnsort applies in more circumstances • Our motivation: out-of-core

Why Relax the Conditions? • Columnsort applies in more circumstances • Our motivation: out-of-core sorting • Column height r is limited by amount of memory – – Either processor or in entire system N = rs, r ≥ 2 s 2 ==> N ≤ r 3/2/21/2 N = rs, r ≥ 4 s 3/2 ==> N ≤ r 5/3/42/3 Reducing the exponent of s in the bound for r allows us to sort more values with a given amount of memory • A similar technique works for applying columnsort to in-core sorting

This Talk • Slabpose columnsort – r ≥ 4 s 3/2 – Requires divisibility

This Talk • Slabpose columnsort – r ≥ 4 s 3/2 – Requires divisibility restriction • Also in the paper – Subblock columnsort • r ≥ 4 s 3/2 with divisibility restriction • r ≥ 6 s 3/2 without divisibility restriction – Proof that the divisibility restriction is unnecessary in the basic columnsort algorithm

Columnsort Steps 1. 2. 3. 4. 5. 6. 7. 8. Sort each column Transpose

Columnsort Steps 1. 2. 3. 4. 5. 6. 7. 8. Sort each column Transpose entire mesh Sort each column Untranspose entire mesh Sort each column Shift down by half a column Sort each column Shift up by half a column

Slabpose Columnsort Steps 1. Sort each column 2. Slabpose: transpose within vertical slabs 3.

Slabpose Columnsort Steps 1. Sort each column 2. Slabpose: transpose within vertical slabs 3. Sort each column 4. Shuffle columns 5. Slabpose Oblivious! 6. Sort each column 7. Untranspose entire mesh 8. Sort each column 9. Shift down by half a column 10. Sort each column 11. Shift up by half a column

Slabpose Columnsort Steps 1. 2. 3. 4. Sort each column Slabpose: transpose within vertical

Slabpose Columnsort Steps 1. 2. 3. 4. Sort each column Slabpose: transpose within vertical slabs Sort each column Shuffle columns + slabpose Oblivious! 5. Sort each column 6. Untranspose entire mesh 7. Sort each column 8. Shift down by half a column 9. Sort each column 10. Shift up by half a column

Why Work With Vertical Slabs? • In regular columnsort, the matrix needs to be

Why Work With Vertical Slabs? • In regular columnsort, the matrix needs to be tall and skinny • Working with vertical slabs allows us to change the aspect ratio to use tall and skinny slabs • We’ll use slabs that are s columns wide • The mesh will have s slabs

0 -1 Principle • If an oblivious algorithm sorts all input sets consisting solely

0 -1 Principle • If an oblivious algorithm sorts all input sets consisting solely of 0 s and 1 s, then it sorts all input sets with arbitrary values • Use the 0 -1 Principle by looking at portions of the r ´ s mesh • Clean: all 0 s or all 1 s • Dirty: may be mixed 0 s and 1 s

Step 1: Sort Each Column 0 dirty r 1 s

Step 1: Sort Each Column 0 dirty r 1 s

Step 2: Slabpose s-slab column s s slabs ≤ s dirty rows

Step 2: Slabpose s-slab column s s slabs ≤ s dirty rows

Step 3: Sort Each Column ≤ s rows

Step 3: Sort Each Column ≤ s rows

Step 4: Shuffle s-slab ≤ s rows s slabs

Step 4: Shuffle s-slab ≤ s rows s slabs

Step 5: Slabpose s-slab r/ s rows ≤ 2 rows s slabs s sets

Step 5: Slabpose s-slab r/ s rows ≤ 2 rows s slabs s sets of dirty rows

Step 6: Sort Each Column ≤ 2 s rows ≤ 2 s 3/2 elements

Step 6: Sort Each Column ≤ 2 s rows ≤ 2 s 3/2 elements

Step 7: Untranspose Entire Mesh ≤ 2 s 3/2 elements 3/2 the Once thersize

Step 7: Untranspose Entire Mesh ≤ 2 s 3/2 elements 3/2 the Once thersize ≥ 4 sof ==>dirty 2 s 3/2 area ≤ r/2 is at most half a column, the last ==>four dirtysteps areawill ≤ half finish a column up

Step 8: Sort Each Column dirty area resides in one column ==> done

Step 8: Sort Each Column dirty area resides in one column ==> done

Step 8: Sort Each Column dirty area resides in two columns ==> no change

Step 8: Sort Each Column dirty area resides in two columns ==> no change

Step 9: Shift Down by Half a Column dirty area resides in one column

Step 9: Shift Down by Half a Column dirty area resides in one column

Step 10: Sort Each Column dirty area resides in one column

Step 10: Sort Each Column dirty area resides in one column

Step 11: Shift Up by Half a Column sorted

Step 11: Shift Up by Half a Column sorted

Subblock Columnsort • Adds two steps to columnsort – Sort each column – A

Subblock Columnsort • Adds two steps to columnsort – Sort each column – A fixed permutation • The permutation is any one that distributes all elements of each s ´ s subblock to all s columns • Like slabpose columnsort, the size of the dirty area is ≤ 2 s 3/2 entering the last four steps • As long as 2 s 3/2 ≤ r/2 (half a column), the last four steps complete the sorting

Removing the Divisibility Restriction from Columnsort • With the divisibility restriction, the dirty rows

Removing the Divisibility Restriction from Columnsort • With the divisibility restriction, the dirty rows after the transpose step have only 0 ->1 transitions • Without the divisibility restriction, there may also be 1 ->0 transitions • The proof shows that even with the 1 ->0 transitions, the size of the dirty area entering the last four steps does not increase • Thus r ≥ 2 s 2 suffices, even without the divisibility restriction

Conclusion • We can get around the restrictions of columnsort • Reduce the exponent

Conclusion • We can get around the restrictions of columnsort • Reduce the exponent in the height restriction from 2 to 3/2 – The mesh need not be quite so tall and skinny – Cost: Two extra steps – In out-of-core implementation, slabpose columnsort requires no additional I/O • The divisibility restriction is unnecessary • Open question: Can we reduce the exponent further within the columnsort framework?