Lecture 8 Applications of parallel prefix Parallel Computing













- Slides: 13
Lecture 8 Applications of parallel prefix Parallel Computing Fall 2012 1
Evaluations of a polynomial n Input: n n (1) A real number x 0, (2) n integer coefficients {a 0, a 1, …, an-1} Output: P(x 0)=a 0+a 1 x 0+a 2 x 02+…+an-1 x 0 n-1 Sequential run-time: O(n). 2
Evaluations of a polynomial: solution using parallel prefix(p=n) n Algorithm: 1) 2) 3) 4) n Each processor Pi gets one coefficient ai Run n-element parallel prefix using x 0 and operator X (multiply), and processor Pi gets x 0 i Each processor Pi multiply ai by x 0 i. Call parallel sum to compute the sum of n terms. Running time: O(log n) 3
Evaluations of a polynomial: solution using parallel prefix(p<n) n Algorithm: 1) 2) 3) 4) 1) 2) n P 0 reads x 0 and broadcasts to all processors Each processor Pi computes x 0, x 02, …, x 0 n/p locally --- O(n/p) Call parallel prefix on the last element x 0 n/p on each processor n So processor Pi has --- O(log p) Each processor Pi updates its group of prefix by multiply to each element ---- O(n/p) Each processor reads n/p of co-effecients ai --- O(n/p) n Hence processor Pi has to Local sum required on processor Pi, Total running time: O(n/p+log p) 4
Linear Recurrences n Input: n n n (1) Real number x 0, x 1 (2) integer coefficients a, b. Output: Sequence {x 2, x 3, …, xn} such that xi=axi-1+bxi-2 5
Linear Recurrences: solution using parallel prefix n n n Relation can be rewritten as Hence Can be extended to dependency on previous k terms. 6
Random number generation n Input: n n (1) Integer multiplier a (2) Integer increment b (3) Interger modulus m Output: Pseudo random sequence {x 1, x 2, …, xn} according to Linear Congruential Generator: xi+1= (axi+b) mod m 7
Random number generation: solution using parallel prefix n Relation can be rewritten as n If all additions are mod m, then n n Hence, Can be extended to dependency on previous k terms. 8
Application of Parallel Prefix: Binary Number Addition n Add two n-bit binary numbers in 2 lg n + 1 steps using an n-leaf c. b. t. (complete binary tree) Sequential algorithm requires n steps. Binary addition example: 16 a 0 b 0 s 15 1 1 g 14 0 1 p 13 1 0 p 12 11 1 0 g p 10 09 08 07 0 0 1 s s p p 06 0 0 s 05 1 1 g 04 03 0 0 1 1 p p 02 1 0 p 01 00 0 0 s s c 0 1 1 0 0 0 1 0 0 a +b 1 1 0 0 0 1 1 1 0 n (a + b)i = ai ⊕ bi ⊕ ci-1, where ⊕ = XOR. n Signal for each bit: • s: stops a carry bit (0 + 0) • g: generates a carry bit (1 + 1) • p: propagates a carry bit (0 + 1 or 1 + 0). 9
Application of Parallel Prefix: Binary Number Addition n n Problem: In order to compute the k-th carry bit the kth− 1 st carry needs to be computed as well. There exists a non-trivial non-obvious parallel solution. We shall try for each bit position to find the carry bit required for addition so that all bit positions can be added in parallel. We shall show that carry computation takes Θ(lg n) time on a binary tree with a computation that is known to us: parallel prefix. Observation. The i-th carry bit is one if the leftmost non-p to the right of the i-th bit is a g. Question. How can we find i-th carry bit? 10
Application of Parallel Prefix: Binary Number Addition n The previous observation takes the following algorithmic form. Scan for j=i, . . . , 0 if p ignore else if g carry=1 exit; else carry=0 exit; n Such a computation requires O(n) time for j = n (n-th bit). Let the i-th bit position symbol (p, s, g) be denoted by xi. Then c 0 = x 0 = s c 1 = x 0 ⊗ x 1 s p g c 2 = x 0 ⊗ x 1 ⊗ x 2 c 16 = x 0 ⊗. . . ⊗ x 16, where s s s g ⊗ p s p g g s g g 11
Application of Parallel Prefix: Binary Number Addition n Algorithm for parallel addition: Step 1. Compute symbol ({s, p, g}) for i bit in parallel for all i. Step 2. Perform a parallel prefix computation on the n symbols plus 0 -th symbol s in parallel where operator is defined as in previous table. Step 3. Combine (exclusive OR) the carry bit from bit position i− 1 (interpret g as an 1 and an s as a 0) with the exclusive OR of bits in position i to find the i-th bit of the sum. n n Steps 1 and 3 require constant time. Step 2, on a complete binary tree on n leaves would require 2 lg n steps. Totally, T = 1 + 2 lgn. P = 2 n − 1 = O(n). 12
End Thank you! 13