Efficient and Accurate Array Access Analysis SOR Lab

Efficient and Accurate Array Access Analysis 백윤흥 한국과학기술원 전자전산학과 SO&R Lab. , EECS, KAIST

Overview • • • Array Access Analysis LMAD(Linear Memory Access Descriptor) Uses of LMADs in compiler optimizations Experiments Conclusion • For more details, refer to … – the web sites at http: //polaris. cs. uiuc. edu or – pp 60 – 71 ACM SIGPLAN PLDI ‘ 98 SO&R Lab. , EECS, KAIST 2

Arrays • The array is a collection of memory locations, which usually stores related information in the collection. • In numeric applications (scientific, DSP, …), almost all computations is performed on arrays. • Many compiler optimization techniques depend their success heavily on the information about how arrays are accessed during the execution of the program. – – – data flow analysis (array SSA, value-range propagation) array privatization dependence analysis (parallel loop detection, ILP exploitation) locality analysis (remote latency or cache optimizations) message generation (communication optimization in NUMA architectures, hardware synthesis, …) – … SO&R Lab. , EECS, KAIST 3

Array Privatization • It is well-known that false (anti, output) dependences can be eliminated by renaming P =Q*4 P = R +1 P = P/2 P =Q*4 P 1 =R +1 P 1 = P 1/2 • Privatization eliminates loop-carried false dependences from the loop by renaming. do K =1, N do I =0, 10 do J = 1, 2**I C(2**I+J) = … enddo do J = 1, 2**10 … = C(2*J) enddo aggregated write region for C: WC: = C(2), C(3), C(4), …, C(2048) aggregated read region for C RC= C(2), C(4), C(8), …, C(2048) WC RC C is privatizable loop K is parallel • How to represent and simplify the array accesses? SO&R Lab. , EECS, KAIST 4

Dependence Analysis • Region-to-region comparison by intersecting two access regions for detecting dependences • Array reshaping problem • How to efficiently and precisely represent array accesses across procedure boundaries? SO&R Lab. , EECS, KAIST 5

Message Generation • The overhead for message initialization is almost the same regardless of the message size. • The communication cost to transfer a large quantity of small data fragments is very high. • How to accurately aggregate array accesses in a certain program section? SO&R Lab. , EECS, KAIST 6

Array Access Analysis • • Identify the access region (the set of all elements of an array accessed during the execution of a given section of code) Summarize access regions and manipulate them to produce the output used by the application. Program program section 1 A(3*i) operation(intersection, union, subtraction) R 1 Region Processor program section 2 A(i+2) R 2 SO&R Lab. , EECS, KAIST output Decision made 7

Representations of Access Regions • Region representation access information RR 1 1 access information RR 2 2 complex operation simple operation • Conventional region representations – triplet notation – linear constraint based notation SO&R Lab. , EECS, KAIST 8

Triplet Notation • General form for an access to m-dimensional array A A(l 1: u 1: s 1, l 2: u 2: s 2, … , ln: un: sn) Ex) X(N-M+1: N: 1, 1), X(1: M: 1, 1), Y(2: 3 N-1: 3), Y(3: 3 N: 3) • easy to operate on operations based on interval analysis Ex) X(N-M+1: N: 1, 1) X(1: M: 1, 1) = X(1: N: 1, 1) if N 2 M • often too restricted to accurately express common access information approximate regions Ex) X(1: M: 1, 2: M+1: 1) Y(2: 3 N-1: 3) Y(3: 3 N: 3) Y(2: 3 N: 1) • lose accuracy in the presence of array reshaping • The techniques in many compilers originally used triplet notation for AAA. SO&R Lab. , EECS, KAIST 9

Linear Constraint-Based Notation • General form: linear systems for array access summaries • flexible to represent convex regions generated by linear expressions Ex) • Region manipulation operations are based on integer programming. worst-case exponential time algorithms • not accurate non-linear/affine expressions SO&R Lab. , EECS, KAIST 10

Non-affine Expressions • Array access patterns typically found in FFT code SO&R Lab. , EECS, KAIST 11

Observations on Traditional Approaches • sensitive to the complexity of array subscripts. – usually accurate & efficient for simple array subscripts – lose accuracy or require potentially expensive algorithms for complex subscripts – fail to capture array accesses with non-affine expressions • Often, the real accesses are simpler than they look. • The simplicity of the real access is hidden inside the subscript expressions. • Previous techniques fail to recognize these simple patterns lose accuracy in their access analysis. • How to expose the simplicity? SO&R Lab. , EECS, KAIST 12

Problem Formulation • An m-dimensional array reference occurring in a d– nested loop SO&R Lab. , EECS, KAIST 13

A New Look at Array Access Patterns • All array accesses can be mapped to a linear memory address space. • Every array access on a linear space can be driven by independent iterations of a set of loop indices. • Each memory movement due to an index can be characterized with two quantities: stride and span • These quantities are captured in a descriptor which we call a linear memory access descriptor. • x SO&R Lab. , EECS, KAIST 14

LMADs for Simple Accesses SO&R Lab. , EECS, KAIST 15

More Complex Accesses SO&R Lab. , EECS, KAIST 16

Array Accesses on a Linear Memory Space SO&R Lab. , EECS, KAIST 17

Accesses with Non-affine Subscripts • Two LMADs for FFT code SO&R Lab. , EECS, KAIST 18

Efficient, Accurate AAA • Representing an array access accurately is not enough. • Often the original representations of accesses are too bulky, requiring complex operation algorithms. • Solution simplify the representations without losing accuracy more efficient, yet accurate, operations on the representations access information RR 1 1 access information complex operation simple operation RR 2 2 SO&R Lab. , EECS, KAIST 19

Simplification of LMADs • The stride/span structures of most array accesses are regular and similar, regardless of the complexity of the access patterns. • The similarity and regularity of array accesses. . . – are easily exposed from the LMAD. Ex) – allow the LMAD to perform two important region operations aggregation, coalescing • The region operations on the LMAD are used to simplify array access • patterns with complex subscript expressions. SO&R Lab. , EECS, KAIST 20

Coalescing • Coalescing operation combines two stride/span pairs into one if the LMAD has an coalesceable access. Ex) We can show these two LMADs are equivalent. • In the same way, we can see that Now, it is easy to privatize the array C in the original example loop • Coalescing often eliminates non-affine expressions from the array access patterns. SO&R Lab. , EECS, KAIST 21

The Coalescing Algorithm SO&R Lab. , EECS, KAIST 22

Similar Accesses • Interleaved access • Contiguous access SO&R Lab. , EECS, KAIST 23

Aggregation • Aggregation operations combine two similar (contiguous or interleaved ) LMADs into one. Ex) SO&R Lab. , EECS, KAIST 24

Breaking the Interprocedural Barrier to Analysis • Array reshaping is irrelevant Add the offset of actual to the LMAD of formal. SO&R Lab. , EECS, KAIST 25

Code Fragment of TFFT 2 SUBROUTINE CFFTZ (IS, M, U, X, Y) DIMENSION U(1), X(1), Y(1) DO I=0, 2**(M/2)-1 CALL CFFTZWORK (IS, M-M/2, U(1+3*2**(1+M)/2), Y(1+I*2**(1+M-M/2)), X) END DO SUBROUTINE CFFTZWORK (IS, M, U, X, Y) DIMENSION U(1), X(1), Y(1) DO L 0=1, (M+1)/2 CALL FFTZ 2 (IS, 2*L 0 -1, M, U, X, Y) CALL FFTZ 2 (IS, 2*L 0 , M, U, Y, X) END DO END SUBROUTINE FFTZ 2 (IS, L, M, U, X, Y) DIMENSION U(*), X(*), DIMENSION Y(0: 2**(L-1)-1, 0: 2**(M-L)-1, 0: 1) DO I=0, 2**(M-L)-1 DO K=0, 2**(L-1)-1 … = X(1+K+I*2**(L-1)) … = X(1+K+I*2**(L-1)+2**M) … = X(1+K+I*2**(L-1)+2**(M-1)+2**M) Y(K, 0, I, 0) = … Y(K, 0, I, 1) = … Y(K, 1, I, 0) = … Y(K, 1, I, 1) = … END DO END SO&R Lab. , EECS, KAIST MAIN CFFTZWORK FFTZ 2 26

Simplification of Accesses to X SO&R Lab. , EECS, KAIST 27

Simplification of Accesses to Y SO&R Lab. , EECS, KAIST 28

Simplification Results 1. Calculated LMADs for all to all loops. 2. Calculated LMADs for all array accesses, summarized to all loops, and applied coalescing and aggregation. 3. Counted total number of access dimensions represented in both steps. 4. 4. Plotted the reduction in number of dimensions, due to simplification. SO&R Lab. , EECS, KAIST 29

LMAD Simplification for Compiler Optimizations • What does this simplification imply? improved compiler optimizations • Array privatization 1. The write accesses and are aggregated into a single LMAD 2. . is coalesced to produce 3. In the same way, the read accesses to Y can be simplified to. 4. It is trivial to show from the simplified form that the write accesses and read accesses are identical. 5. privatize Y • Put/Get generation, dependence tests, … SO&R Lab. , EECS, KAIST 30

Current Research using LMADs • University of Illinois at Urbana-Champaign – The Polaris compiler – Communication message generation for NUMA machines – The Access Region Test • Texas A&M university – Run-time techniques for parallelization • University of Málaga – Locality analysis – Data distribution • KAIST – Put/Get generation for a PC-cluster system – An optimizing compiler for embedded processors – C compilation for the Virtual Chip project (? ) SO&R Lab. , EECS, KAIST 31

The Polaris Compiler SO&R Lab. , EECS, KAIST 32

Code Restructuring for Cray MPP SO&R Lab. , EECS, KAIST 33

Parallel Speedups on Cray T 3 D SO&R Lab. , EECS, KAIST 34

Conclusion • What you see is not always what you get… – complex subscripts complex access pattern • LMAD form exposes access pattern characteristics • LMAD form gives intuitive feel for access pattern • LMAD form enables simplification • LMAD form enables IPA without reshaping problem • LMAD enables compiler optimizations in codes with complex subscripts SO&R Lab. , EECS, KAIST 35