CMPUT 680 Winter 2006 Topic C Loop Fusion
- Slides: 50
CMPUT 680 - Winter 2006 Topic C: Loop Fusion Kit Barton www. cs. ualberta. ca/~cbarton March 14, 2002 1
Outline • • • Definition of loop fusion Basic concepts Prerequisites of loop fusion A loop fusion algorithm Example March 14, 2002 2
Loop Fusion • Combine 2 or more loops into a single loop • This cannot violate any dependencies between the loop bodies • Several conditions which must be met for fusion to occur • Often these conditions are not initially satisfied March 14, 2002 3
Advantages of Loop Fusion • Save increment and branch instructions • Creates opportunities for data reuse • Provide more instructions to instruction scheduler to balance the use of functional units March 14, 2002 4
Disadvantages of Loop Fusion • Increase code size effecting instruction cache performance • Increase register pressure within a loop • Could cause the formation of loops with more complex control flow March 14, 2002 5
Background • There has been extensive work done on loop fusion • Most has focused on weighted loop fusion (Gao et al. , Kennedy and Mc. Kinley, Megiddo and Sarkar) • Extensive work has also been done it performing loop fusion to increase parallelism March 14, 2002 6
Weighted Loop Fusion • Associates non-negative weights with each pair of loop nests • Weights are a measurement of the expected gain if the two loops are fused • Gains include potential for array contraction, data reuse and improved local register allocation March 14, 2002 7
Optimal Loop Fusion • Fuse loops to optimize data reuse, taking into consideration resource constraints and register usage • This problem is NP-Hard March 14, 2002 8
Maximal Loop Fusion • Our approach is to perform maximal loop fusion • Fuse as many loops as possible, without considering resource constraints • Fuse loops as soon as possible, not considering the consequences March 14, 2002 9
Dominators and Post Dominators • A node x in a directed graph G with a single exit node dominates node y in G if any path from the entry node of G to y must pass through x • A node x in a directed graph G with a single exit node post-dominates node y in G if any path from y to the exit node of G must pass through x March 14, 2002 Allen & Kennedy, p. 150, 353 10
Requirements for Loop Fusion i. Loops must have identical iteration counts (be conforming) ii. Loops must be control-flow equivalent iii. Loops must be adjacent iv. There cannot be any negative distance dependencies between the loops March 14, 2002 11
Non-conforming Loops • If iteration counts are different, one loop must be manipulated to make the iteration counts the same 1. Loop peeling 2. Introduce a guard into one of the loops March 14, 2002 12
Loop Peeling • Find the difference between the iteration count of the two loops (n) • Duplicate the body of the loop with the higher iteration count n times • Update the iteration count of the peeled loop March 14, 2002 13
Loop Peeling Example while (i < 10) { a[i] = a[i - 1] * 2; i++; } while (j < 12) { b[j] = b[j - 1] - 2; j++; } March 14, 2002 while (i < 10) { a[i] = a[i - 1] * 2; i++; } while (j < 10) { b[j] = b[j - 1] - 2; j++; } b[j] = b[j - 1] - 2; j++; 14
Guarding Iterations • Increase the iteration count of the loop with fewer iterations • Insert a guard branch around statements that would not normally be executed March 14, 2002 15
Guarding Iterations Example while (i < 10) { a[i] = a[i - 1] * 2; i++; } while (j < 12) { b[j] = b[j - 1] - 2; j++; } March 14, 2002 while (i < 12) { if (i < 10) { a[i] = a[i - 1] * 2; i++; } } while (j < 12) { b[j] = b[j - 1] - 2; j++; } 16
Loop Peeling • Advantage: • Does not generate control flow within a loop body • Disadvantage: • Generates additional code outside of loops, which could possible intervene with other loops March 14, 2002 17
Guarding Iterations • Advantages: • Does not introduce intervening code • Can be “undone” later • Disadvantage: • Generates control flow within a loop March 14, 2002 18
Control Flow Equivalence • Two loops are control-flow equivalent if when one executes, the other also executes Loop 1 BB Loop 2 March 14, 2002 Loop 3 19
Determining Control Flow Equivalence • Use the concepts of dominators and post dominators. Two loops L 1 and L 2 are control-flow equivalent if the following two conditions are true: • L 1 dominates L 2; and • L 2 post dominates L 1. March 14, 2002 20
Intervening Code • Two loops are adjacent if there are no statements between the two loops • Can be determined using the CFG: • If the immediate successor of the first loop is the second loop, the two loops are adjacent • If two loops are not adjacent, there is intervening code between them. March 14, 2002 21
Dealing with Non-Adjacent Loops • If two loops are not adjacent, we attempt to make them adjacent by moving the intervening code • Intervening code can be moved: • Above the first loop • Below the second loop • Both • as long as no data dependencies are violated March 14, 2002 22
Intervening Code Example 6 Loop 1 7 8 10 9 11 12 14 13 15 16 Loop 2 March 14, 2002 • Assume CFG has 20 nodes • 0 -5 are above Loop 1 • 17 -19 are below Loop 2 • What algorithm should be used to determine which nodes are between Loop 1 and Loop 2? 23
Gathering Intervening Code • Given two loops L 1 and L 2, a basic block B is intervening code between L 1 and L 2 if and only if: o B is strictly dominated by L 1 o B is not dominated by L 2 • Once the dominance relations are known, the set subtraction can be efficiently computed using bit vectors March 14, 2002 24
Intervening Code Example 6 Loop 1 0000 0011 1111 1 7 8 10 9 11 Loop 2 0000 1111 1 12 14 13 15 16 Loop 2 March 14, 2002 Difference 0000 0011 1111 0000 0 25
Analyze Intervening Code • Build a DDG of the intervening code • Put all nodes with no predecessors into queue • For each node in the queue: • If there are no dependencies between the node and the loop • Mark node as moveable • Add all of the nodes immediate successors to the queue • All nodes marked can be moved around the loop March 14, 2002 26
Non-Adjacent loops example while (i < N) { a += i; i++; } b : = a * 2; c : = b + 6; g : = 0; h : = g + 10; if (c < 100) d : = c/2; else e : = c * 2; while (j < N) { f : = g + 6; j++; } March 14, 2002 b : = a * 2; g : = 0; c : = b + 6; h : = g + 10; if (c < 100) d : = c/2; else e : = c * 2; 27
Non-Adjacent loops example while (i < N) { a += i; i++; } b : = a * 2; c : = b + 6; g : = 0; h : = g + 10; if (c < 100) d : = c/2; else e : = c * 2; while (j < N) { f : = g + 6; j++; } March 14, 2002 g : = 0; h : = g + 10; while (i < N) { a += i; i++; } while (j < N) { f : = g + 6; j++; } b : = a * 2; c : = b + 6; if (c < 100) d : = c/2; else e : = c * 2; 28
Non-Adjacent loops example DDG Loop 2 while (j < N) { b : = a * 2; g : = 0; f : = g + 6; Node Queue b : = a * 2; g : = 0; j++; c : = b + 6; } c : = b + 6; h : = g + 10; Moveable Nodes b : = a * 2; c : = b + 6; if (c < 100) d : = c/2; else e : = c * 2; March 14, 2002 e : = c * 2; 29
Non-Adjacent loops example DDG b : = a * 2; Loop 1 g : = 0; while (i < N) { a += i; i++; Node Queue b : = a * 2; g : = 0; h : = g + 10; } c : = b + 6; h : = g + 10; Moveable Nodes if (c < 100) g : = 0; h : = g + 10; d : = c/2; else e : = c * 2; March 14, 2002 30
Dependencies Preventing Fusion Can the following loops be fused? i = j = 1; while (i < 10) { a[i] = c[i] + 10; i++; } while (j < 10) { b[j] = a[j+1] * 2; j++; } March 14, 2002 31
Dependencies Preventing Fusion • If we look at the array access patterns of a[], we see the following a[i] = c[i] + 10; b[j] = a[j+1] * 2; March 14, 2002 32
Dependencies Preventing Fusion • By aligning the array access patterns, we get the following: a[i] = c[i] + 10; b[j] = a[j+1] * 2; March 14, 2002 33
Loop Alignment i = j = 1; while (i < 10) { a[i] = c[i] + 10; i++; } while (j < 10) { b[j] = a[j+1] * 2; j++; } March 14, 2002 j = 1; i=2 a[1] = c[1] + 10; while (i < 10) { a[i] = c[i] + 10; i++; } while (j < 10) { b[j] = a[j+1] * 2; j++; } 34
Loop Alignment • Loop alignment can be used to remove dependencies between loop bodies • Easy to do when all dependencies have the same distance • Gets tricky when there are multiple dependencies with different distances March 14, 2002 35
Putting it all together • We’ve seen ways to deal with each of the preconditions of loop fusion • If the conditions are not met, we apply transformations to try and modify the code • If the transformations are successful, loop fusion can occur • But in what order should these transformations be applied? March 14, 2002 36
Loop Fusion Algorithm For each Ni from outermost to innermost: Gather control equivalent loops in Ni into Loop. Sets For each set Si in Loop. Sets remove non-eligible loops from Si Fused. Loops = true Direction = forward while Fused. Loops == true if |Si| < 2 break Compute Dominance Relation Fused. Loops = Loop. Fusion. Pass(Si, Direction) Reverse Direction March 14, 2002 37
Loop Fusion Algorithm Loop. Fusion. Pass(S, Direction) Fused. Loops = false For each pair of loops Lj and Lk in S such that Lj dominates Lk in Direction if (Dependence. Distance(Lj, Lk) < 0) continue if (Intervening. Code(Lj, Lk) == true and Is. Intervening. Code. Moveable(Lj, Lk) == false) continue d = | Iteration. Count(Lj) – Iteration. Count(Lk) | if (Lj and Lk are non-conforming and (d cannot be determined at compile time or d > MAXPEEL)) continue if (Lj and Lk are non-conforming) Peel iterations Move. Intervening. Code(Lj, Lk) if Intervening. Code(Lj, Lk) == false Fuse. Loops(Lj, Lk) Fused. Loops = true March 14, 2002 Return Fused. Loops 38
Example L 1: do i 1 = 1, n a(i 1) = a(i 1) * k 1 end do L 2: do i 2 = 1, n-1 d(i 2) = a(i 2) - b(i 2+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 Loop Set L 1 L 2 L 3 L 4 39
Peeling Loop 1 L 1: do i 1 = 1, n a(i 1) = a(i 1) * k 1 end do L 2: do i 2 = 1, n-1 d(i 2) = a(i 2) - b(i 2+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 S 7: a(1) = a(1) * k 1 L 1: do i 1 = 1, n-1 a(i 1+1) = a(i 1+1) * k 1 end do L 2: do i 2 = 1, n-1 d(i 2) = a(i 2) - b(i 2+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do 40
Fuse L 1 and L 2 S 7: a(1) = a(1) * k 1 L 1: do i 1 = 1, n-1 a(i 1+1) = a(i 1+1) * k 1 end do L 2: do i 2 = 1, n-1 d(i 2) = a(i 2) - b(i 2+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 S 7: a(1) = a(1) * k 1 L 5: do i 5 = 1, n-1 a(i 5+1) = a(i 5+1) * k 1 d(i 5) = a(i 5) - b(i 5+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do 41
Compare L 5 and L 3 S 7: a(1) = a(1) * k 1 L 5: do i 5 = 1, n-1 a(i 5+1) = a(i 5+1) * k 1 d(i 5) = a(i 5) - b(i 5+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 • We now compare loops L 5 and L 3 • They are not adjacent, but the intervening code can move • Difference in iteration count is not know, so fusion fails 42
Compare L 5 and L 4 S 7: a(1) = a(1) * k 1 L 5: do i 5 = 1, n-1 a(i 5+1) = a(i 5+1) * k 1 d(i 5) = a(i 5) - b(i 5+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 Intervening Code S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m 43
Peel L 5 S 7: a(1) = a(1) * k 1 L 5: do i 5 = 1, n-1 a(i 5+1) = a(i 5+1) * k 1 d(i 5) = a(i 5) - b(i 5+1) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do 44
Move Intervening Code S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do S 1: ds = 0. 0 L 3: do i 3 = 1, m ds = ds + d(i 3) end do S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do 45
Reverse Pass S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 Loop Set L 1 L 3 L 4 Sorted in Reverse Dominance Direction L 4 L 3 L 1 46
Compare L 4 and L 3 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 • Compare L 4 and L 3 • No dependencies to prevent fusion • Iteration count cannot be determined at compile time • Fusion fails 47
Compare L 4 and L 5 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 Intervening Code L 3: do i 3 = 1, m ds = ds + d(i 3) end do 48
Move Intervening Code S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do March 14, 2002 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do 49
Fuse L 4 and L 1 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 5: do i 5 = 1, n-2 a(i 5+2) = a(i 5+2) * k 1 d(i 5+1) = a(i 5+1) - b(i 5+2) * k 2 end do L 4: do i 4 = 1, n-2 b(i 4) = a(i 4) + b(i 4) / c(i 4) end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do March 14, 2002 S 7: a(1) = a(1) * k 1 S 8: a(2) = a(2) * k 1 S 9: d(1) = a(1) - b(2) * k 2 S 1: ds = 0. 0 S 2: if (n<m) S 3: c(n-2) = n S 4: else S 5: c(n-2) = m L 6: do i 5 = 1, n-2 a(i 6+2) = a(i 6+2) * k 1 d(i 6+1) = a(i 6+1) - b(i 6+2) * k 2 b(i 6) = a(i 6) + b(i 6) / c(i 6) end do L 3: do i 3 = 1, m ds = ds + d(i 3) end do 50
- Winter kommt winter kommt flocken fallen nieder
- Es ist kalt es ist kalt flocken fallen nieder
- Winter kommt winter kommt flocken fallen nieder
- Nur 680
- A 680 newton student runs up a flight of stairs
- F tag 842
- Christina corrigan
- Ese 680
- Bme 680
- 44 word form
- Altair 680
- Ese 680
- Ese 680
- Talk 680
- Multi loop pid controller regolatore pid multi loop
- Manakah yang lebih baik open loop atau close loop system
- Do while loop adalah
- Fifth gear loop the loop
- Diagram blok pemanggang roti
- Arch loop whorl
- Open loop vs closed loop in cars
- Pee writing strategy
- Narrowed down topic
- Cmput 267
- Cmput 229
- Cmput 412
- Cmput 367
- Cmput 365
- Cmput
- Cmput 229
- Cmput 382
- Cmput 603
- Cmput 101
- Cmput 229
- Cmput 301
- Cmput 429
- Cmput 428
- Fusion nuclear
- Scad in fusion 360
- Havco fusion flooring
- Heat fusion device
- European fusion development agreement
- Icf fusion
- Inertial confinement fusion lasers
- Methodologies for cross-domain data fusion: an overview
- Kentucky intelligence fusion center
- Fusion arc welding
- Fuerzas de dispersión
- Nagios 教學
- Nuclear fusion
- Specific heat capacity equation rearranged