A Dimension Abstraction Approach to Vectorization in Matlab
A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta Edmonton, Alberta, Canada
Problem n Problem Statement: ¨ Generate equivalent, error-free vectorized source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.
Motivation n Loop-based code is slower than vector code in Matlab. ¨ Why? interpretive overhead (type/shape checking, …) n resizing of arrays in loops n=1000; for i=1: n, A(i)=B(i)+C(i); end n n 5 x faster! n=1000; A(1: n)=B(1: n)+C(1: n); Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.
Related Work n Data dependence vectorization ¨ Allen n & Kennedy’s Codegen algorithm Build data dependence graph Topological visit strongly connected components Abstract Matrix Form (AMF) [Menon & Pingali] ¨ axioms used to transform array code ¨ take advantage of matrix multiplication ¨ Not clear if it is easily extensible or allows for vectorization of irregular access (e. g. , access to the diagonal)
Incorrect Vectorization n Example 1: for i=1: n, Pull out of loop. Index variable a(i)=b(i)+c(i); substitution (i 1: n) end n a(1: n)=b(1: n)+c(1: n) Vectorization correct if a, b, and c are row vectors or column vectors If this is not true the vectorized code will introduce an error!
Incorrect Vectorization n Example 2: for i=1: n, x(i)=y(i, h)*z(h, i); end Matlab is untyped n Vectorization depends on whether h is a vector or scalar. n ¨ If h is a scalar: ¨ Otherwise: x(1: n)=y(1: n, h). *z(h, 1: n)’; x(1: n)=sum(y(1: n, h). *z(h, 1: n)’, 2);
Overview of Solution Data dependence-based vectorizer Vectorizable statement Knowledge of Shape of variables Propagate dimensionality up parse tree Dimensions Agree? Yes Perform Transformations No Leave statement in loop Output Vector statement
More Specifically Examples: Type n Represent dimensionality of expressions as list of symbols 1 or “*” (>1) ¨ Assume known for variables. ¨ Propagate n dim scalar (1) 1 xn vector (1, *) nx 1 vector (*, 1), (*) mxn matrix (*, *) up parse tree according to Matlab rules Compatibility: ¨ dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)
Vectorized Dimensionality n Vectorized dimensionality: ¨ representation of dimensions after vectorization of a loop ¨ denoted dimi for loop with index variable i ¨ Introduce new symbol ri for index variable i for i=1: n, a(i)=10+i; end exp dim(exp) vectorized dimi(exp) 10 (1) i (1) 1: n (1, ri) a (*) a(i) (1) a(1: n) (ri)
Vectorized Dimensionality Expressions with incompatible vectorized dimensionality should not be vectorized. n When do dimensionalities agree? n ¨ Assignment n expressions: elhs=erhs dimi(elhs)≈dimi(erhs) || erhs≈(1) ¨ Element-wise n Θ in {+, -, . *, …} binary operators: e=elhsΘerhs dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs)
Vectorized Dimensionality dimi, j(B)=(rj, ri) dimi, j(C)=(ri, rj) for i=1: 100, for j=1: 100 A(i, j)=B(j, i)+C(i, j); end nend Rules very restrictive: ¨ Assume Vectorization fails because (ri, rj) is not compatible with (rj, ri) dim(A)=dim(B)=dim(C)=(*, *)
Transpose Transformation n Extension to utilize transpose when necessary is straightforward: ¨ For n assignment: if dimi(A)≈reverse(dimi(B)) then A=BT is allowable for i=1: m, for j=1: n A(i, j)=B(j, i); end dimi, j(A)=reverse(dimi, j(B))=(ri, rj) A(1: m, 1: n)=(B(1: n, 1: m))’
Transpose Transformation n Extension to utilize transpose when necessary is straightforward: ¨ Similar for pointwise operations: if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A) n if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B) n
Pattern Database n n Dimensionality disagreement at binary operators inhibits vectorization. Recognizing patterns (consisting of operator type and operand dimensionalities) can be used to identify a transformation enabling vectorization. Pattern: lhs operation rhs output (ri, rj) Θ (ri, 1) (ri, rj) for i=1: m, for j=1: n, A(i, j)=B(i, j)+C(i); end Transformed Result B(1: m, 1: n)+repmat(C(1: m), 1, n);
Pattern Database n Diagonal access pattern: Pattern: lhs (ri, ri) for i=1: n, a(i)=A(i, i)*b(i); end operation (index) rhs nil output (1, ri) a(1: n)=A((1: n)+size(A, 1)*((1: n)-1)). *b(1: n); Column major indexing of A
Additive Reduction Statements n Additive-reduction statements use a loop variable to perform an accumulation. ¨ Not all loop nest index variables appear in output dimensionality for i 1=…, Loop nest variables for i 2=…, I={i 1, i 2, …, ik} … for i=1: m, I={i, j} J={i} J is a subset of E for ik=… for j=1: n, A(J)=A(J)+E; a(i)=a(i)+B(i, j); … end end end
Additive Reduction (Solution) n Maintain/propagate dimensionality and reduced variables for an expression. ¨ n ρ(E) denotes the reduced variables for expression E When checking statement A(J)=A(J)+E ¨ ensure dimi 1, i 2, …, ik. A(J)≈dimi 1, i 2, …, ik(E) and ρ(E)=I-J ¨ any variable ri in I-J but not in ρ(E) must be reduced for i=1: m a=a+b(i); a=a+10; end I={i}, J={} I-J={i} ρ(b(i))={} ρ(10)={} rirnot in dim i(b(i))=(ri, 1) i(10) Reduce: 10 m*10, b(i) sum(b(i), 1); ρ(m*10)={ri} Vectorize: a=a+m*10; a=a+sum(b(1: m));
Additive Reduction via Matrix Multiplication for i=1: m for j=1: n a(i)=a(i)+B(i, j)*x(j); end • j is used for reduction • dimi, j(B(i, j))=(ri, rj) • dimi, j (x(j))=(rj) a(1: m)=a(1: m)+… B(1: m, 1: n)*x(1: n); n Matrix multiplication can be used to perform reductions on e=elhs*erhs , provided: 1. 2. 3. ¨ dimi 1, …, ik(elhs)=(Sl, rk) dimi 1, …, ik(erhs)=(rk, Sr) rk is a reduction variable. Implies: n n dimi 1, …, ik(e)=(Sl, Sr) ρ(e)=union(ρ(elhs), ρ(erhs), {rk})
Additive Reduction Example n Additive reduction example: for i=1: m, for j=1: n, d(i)=d(i)+a(i, j)*b(j)+c(i, j) end ρ(a(i, j))={}, dimi, j(a(i, j))=(ri, rj) ρ(b(j))={}, dimi, j(b(j))=(rj) rj is reduction variable Use matrix multiplication to reduce rj ρ(a(i, j)*b(j))={rj}, dimi, j(a(i, j)*b(j))=(ri) Need to reduce rj: c(i, j) sum(c(i, j), 2); Dimensionality and reduced variables agree, now replace index variables: ρ(c(i, j))={}, dimi, j(c(i, j))={ri, rj} ρ(a(i, j)*b(j)+sum(c(i, j), 2))={rj}, dimi, j(a(i, j)*b(j)+sum(c(i, j), 2)=(ri, rj) d(1: m)=d(1: m)+a(1: m, 1: n)*b(1: n)+sum(c(1: m, 1: n), 2);
Implementation Prototype Original Loop Octave Parser Vectorizer Embedded Control Statements no Create DDG n Vectorized Loop Code Generator yes Dimension Check Success yes no Vectorize Statement Pattern database and corresponding transformations are specified in modular end-user extensible manner.
Results Source-to-source transformation n Timing results averaged over 100 runs: n Platform: n ¨ Matlab 7. 2. 0. 283 ¨ 3. 0 GHz Pentium D Processor
Results n Histogram Equalization: Input source h=hist(im(: ), [0: 255]); %histogram heq=255*cumsum(h(: ))/sum(h(: )); for i=1: size(im, 1), for j=1: size(im, 2), im 2(i, j)=heq(im(i, j)+1); end n Vectorized Result h=hist(im(: ), [(0: 255)]); heq=255*cumsum(h(: ))/sum(h(: )); im 2(1: size(im, 1), 1: size(im, 2))=. . . heq(im(1: size(im, 1), 1: size(im, 2))+1); For monochrome 8 -bit 800 x 600 image: ¨ original/vectorized: n n Entire routine: 0. 178 s/0. 114 s (speedup: 1. 56) Loop Portion only: 0. 0814 s/0. 0176 s (speedup: 4. 6)
Results (Menon & Pingali Examples) for k=1: p, for j=1: (i-1), X(i, k)=X(i, k)-L(i, j)*X(j, k); end X(i, 1: p)=X(i, 1: p)-L(i, 1: i-1)*X(1: i-1, 1: p); for i=1: N, for j=1: N phi(k)=phi(k)+a(i, j)*x_se(i)*f(j); end phi(k)=phi(k)+sum(a(1: N, 1: N)’* x_se(1: N). *f(1: N), 1); for i=1: n, for j=1: n, for k=1: n, for l=1: n y(i)=y(i)+x(j)*A(i, k)* B(l, k)*C(l, j); end end Settings y(1: n)=y(1: n)+x(1: n)’*. . . (A(1: n, 1: n)*B(1: n, 1: n)’*C(1: n, 1: n))’; Input time (s) Output time(s) speedup i=500, p=5000 0. 536 s 0. 030 s 17 N=1000 0. 174 s 0. 012 s 14 n=40 0. 622 s 0. 0001 s 5000
Remaining Issues/Future Work n Each pattern transformation is local; no optimization over entire statement. ¨ e. g. , n n we do not optimize and distribute transposes Control flow within loop Function calls ¨ functions are treated as pointwise operators (correct for many predefined arithmetic functions) n Incorporate our analysis directly with shape analysis
Summary n Contributions: ¨A simple method to prevent incorrect vectorization in Matlab ¨ A user extensible operator/dimensionality pattern database can be used to improve vectorization n These patterns can make use of higher level semantics (e. g. , matrix multiplication) or diagonal accesses in vectorization.
Acknowledgements Funding provided by NSERC n Grateful for reviewers comments and suggestions n
Thank You Questions?
- Slides: 27