Kenneth E Batcher Professor Kent State University http
Kenneth E. Batcher Professor, Kent State University http: //www. cs. kent. edu/~batcher “Sorting networks and their applications”, AFIPS Proc. of 1968 Spring Joint Computer Conference, Vol. 32, pp 307 -314.
2 0 1 8 陈 贵 海 Background Sorting is fundamental Low bound of any sequential sorting algorithms is O(nlogn) Can we improve the time complexity further? – Parallel algorithms – Circuit/Network Design – Parallel Computing Models
2 0 1 8 ①Bitonic Sequence 双调序列 sequence of elements {a 0, a 1, …, an-1} where either – (1) there exists an index, i, 0 i n-1, such that {a 0, …, ai} is monotonically increasing, and {ai+1, …, an-1} is monotonically decreasing, – e. g. {1, 2, 4, 7, 6, 0} Or – (2) there exists a cyclic shift of indices so that (1) is satisfied – e. g. {8, 9, 2, 1, 0, 4} {0, 4, 8, 9, 2, 1} 陈 贵 海
2 0 1 8 ①Bitonic Sequence : Examples Value of element 陈 贵 海 { 3, 5, 7, 9, 8, 6, 4, 2 } a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 ai { 8, 6, 4, 2, 3, 5, 7, 9} a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 ai
2 0 1 8 ①Bitonic Sequence : Examples Value of element { 3, 5, 7, 9, 11, 13, 15, 17 } a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 ai { 5, 3, 1, 2, 4, 6, 8, 7 } 陈 贵 海 a 0 a 1 a 2 a 3 a 4 a 5 a 6 a 7 ai
2 0 1 8 Bitonic Sort: basic idea Consider a bitonic sequence S of size n where – the first half ( {a 0, a 1, …, an/2 -1} ) is increasing, and the second half ( {an/2, an/2+1, …, an-1} ) is decreasing Value of element 陈 贵 海 a 0 a 1. . . an/2 -1 an/2 +1 … an-1 ai
②“Bitonic Split” 双调分裂 2 0 1 8 Pair-wise min-max comparison – s 1 = {min(a 0, an/2), min(a 1, an/2+1), … , min(an/2 -1, an-1)} – s 2 = {max(a 0, an/2), max(a 1, an/2+1), … , max(an/2 -1, an-1)} Compare and exchange a 0 a 1. . . 陈 贵 海 value an/2 -1 a 0 an-1 an/2 +1 … an-1 value S 2 S 1 ai
2 0 1 8 There exists – an element b in S 1 such that all elements before b is increasing and all elements after b is decreasing – an element c in S 2 such that all elements before c is decreasing and all elements after c is increasing S 1 and S 2 – Both S 1 and S 2 are bitonic sequences – Any elements in S 1 < any elements in S 2 (because b < c and b is the maximum value in S 1 and c is the minimum value in S 2) 陈 贵 海 value S 2 S 1 c b
2 0 1 8 陈 贵 海 pair-wise min-max comparison e. g. { 2, 4, 6, 8, 7, 5, 3, 1} { 2, 4, 6, 8 Compare and exchange 7, 5, 3, 1 } => S 1={2, 4, 3, 1} S 2={7, 5, 6, 8} bitonic sequence of size 8 => 2 bitonic sequence of size 4
2 0 1 8 ②Bitonic Split The split is applicable to any bitonic sequence. Need not to have the 1 st half to be increasing/decreasing and the 2 nd half to be decreasing/increasing: Bitonic(n) 陈 贵 海 Bitonic Split 2 Bitonic(n/2)
2 0 1 8 陈 贵 海 Sorting a bitonic sequence By using bitonic split recursively, INPUT: a bitonic sequence of size n Þ Phase 1: 2 bitonic sequence of size n/2 Þ Phase 2: 4 bitonic sequence of size n/4 Þ … Þ Phase (log n): n bitonic sequence of size 1 Þ a sorted sequence can be generated by concatenating the n bitonic sequence of size 1
2 0 1 8 ③Bitonic Merge 双调合并 sort a bitonic sequence using bitonic splits length 1 2 3 4 5 6 7 8 9 10111213141516 16 8 4 陈 贵 海 2 Anything wrong with this slide?
Bitonic Merge Circuit : BM[16] 2 0 1 8 陈 贵 海 What do you think of ?
2 0 1 8 陈 贵 海 Questions ? How can we convert an unsorted sequence to a bitonic sequence ? (then, by using bitonic split recursively, a sorted sequence can be formed).
2 0 1 8 Turn an unsorted sequence into a bitonic sequence: ③Bitonic Merge (BM) Operation length 1 2 3 4 5 6 7 8 9 10111213141516 4 8 16 陈 贵 海 At every phase, sort a bitonic sequence of size 2, 4, 8, 16 into a monotonically increasing or decreased sequence
2 0 1 8 陈 贵 海 Turn an unsorted sequence into a bitonic sequence
2 0 1 8 ④Bitonic Sort length 4 8 16 陈 贵 海 1 2 3 4 5 6 7 8 9 10111213141516
2 0 1 8 Sort (any ordered of) sequence Using bitonic merge repeatedly Definition: – BM[n]: increasing bitonic merge of size n • bitonic merge : sort a bitonic sequence of size n into a monotonically increasing sequence – BM[n]: decreasing bitonic merge of size n • bitonic merge that sort a bitonic sequence of size n into a monotonically decreasing sequence 陈 贵 海
2 0 1 8 Steps: Divide the sequence into a group of 2 – any sequence of size 2 is a bitonic sequence: either the increasing part is of size 2 and the decreasing part is of size 0, or vice versa Using BM[2] on a group to form an increasing sequence, and BM[2] on the adjacent group to form an decreasing sequence Concatenate the two group to form a bitonic sequence of size 4 陈 贵 海
2 0 1 8 陈 贵 海 Steps: Repeat the above steps on other groups Repeat the above steps recursively, until a bitonic sequence of size n is formed Using bitonic merge again to turn the bitonic sequence into a sorted sequence
Bitonic Sorting Circuit: BS(18) 2 0 1 8 陈 贵 海 – BM[n]: increasing bitonic merge of size n – BM[n]: decreasing bitonic merge of size n
2 0 1 8 陈 贵 海 Sort (any ordered of) sequence Hence, n unsorted numbers n/2 group of 2 -number bitonic sequence n/4 group of 4 -number bitonic sequence … 1 group of n-number bitonic sequence a sorted sequence
2 0 1 8 陈 贵 海 ⑤Complexity of Bitonic Sort Parallel bitonic sort with n processor – The last stage of an n-element bitonic sorting need to merge n-element, and has a depth of log(n) – Other stages perform a complete sort of n/2 elements – Depth, d(n) = d(n/2) + log(n) – d(n) = 1 + 2 + 4 + … + log(n) = (log 2 n) – Complexity: T(n) = (log 2 n)
2 0 1 8 ⑤Complexity of Bitonic Sort Parallel sorting with a block of elements per processor – sort the local block of elements first (using any sorting algorithm such as quicksort, bitonic sort) – sort the elements among processors using parallel bitonic sort – T(n) = T(local_sort) + T(comparisons) +T(communication) 陈 贵 海 Only computation time is considered here (you need to consider all communication time also)
2 0 1 8 ⑥Concluding Remarks Bitonic Sorting: Common Sense Regression to Computer Science One of 10 Most Important Papers Parallel Algorithm: Ascend/Descend – Neighbors are communicated in dimension i,i is from 1 to N, or from N to 1 – Another example: Prefix sum 陈 贵 海 Network Model:
2 0 1 8 陈 贵 海 Bitonic Sorting Network Hypercube connections! Try to Write Bitonic Sorting algorithm on hypercube. 27
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 28
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 29
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 30
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 31
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 32
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 33
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 34
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 35
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 36
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 37
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 38
Bitonic Sort on Butterfly 2 0 1 8 陈 贵 海 39
2 0 1 8 PRAM Model P 1 P 2 P 3 … Pn Memory 陈 贵 海 Access time from any processor to any memory unit is equal It is impossible in practice So it is an ideal model for parallel computing Let focus only on algorithm design
2 0 1 8 陈 贵 海 PRAM Model Equal Access time is impossible even in sequential computer
2 0 1 8 陈 贵 海 PRAM Model Program for Sum= a(1)+a(2)+…+a(N) for i = 1 to log N for j= 1 to n/ 2 i parallel do a(j) = a(j) + a(N/ 2 i + j) endpar endfor Finally a(1) is the sum
2 0 1 8 陈 贵 海 PRAM Model Program for Sum= a(1)+a(2)+…+a(N) Processor P(i) holds a(i) Finally a(1) is the sum
PRAM Model (Prefix Sum) 2 0 1 8 陈 贵 海 for i = 1 to log N for j= 1 to N parallel do s(j) = s(j) + s(N/ 2 i + j) s(2 i + j) =s(j) + s(2 i + j) s(j) = s(j) -a (N/ 2 i + j) endpar endfor
PRAM Model (Prefix Sum) 2 0 1 8 a(1) a(2) a(3) a(4) a(5) a(6) a(7) a(8) a(2) a(3) a(4) a(1)+a(5) a(2)+a(6) a(3)+a(7) a(4)+a(8) a(2)+a(4) a(1) 陈 贵 海 a(1) a(2) a(1)+a(3) a(2)+a(4)+a(6)+a(8) a(1)+a(3)+a(5)+a(7) a(1)+a(2)+a(3)+a(4) +a(5)+a(6)+a(7)+a(8 a(1)+a(2)+a(3)+a(4)+a(5)+a(6) a(1)+a(2)+a(3)+a(4)+a(5)+a(6)+a(7)
2 0 1 8 陈 贵 海 Hypercube Model Suppose node N(x) holds element a(x), where x is the value of node index x 1 x 2…xn for i = 1 to n for j = i to n parallel do N(00… 0 (xj=0) xj+1…xn) N(00… 0 (xj=1) xj+1…xn); a(00… 0 (xj=0) xj+1…xn) = a(00… 0 (xj=0) xj+1…xn) + a(00… 0 (xj=1) xj+1…xn) endpar endfor Finally node 00… 0 holds the sum
2 0 1 8 Hypercube Model Suppose node 000 holds element a(0) and 111 holds element a(7) a(4) a(0) a(5) a(1) a(0)+a(4) a(6) a(2) a(7) a(3) a(2)+a(6) a(0)+a(4) +a(2)+a(6) a(3)+a(7) a(0)+a(4) +a(2)+a(6) +a(1)+a(5)+a(3)+a(7) 陈 贵 海 a(1)+a(5)
2 0 1 8 Hypercube Model (Prefix Sum) ? 陈 贵 海
- Slides: 48