A Practical Stride Prefetching Implementation in Global Optimizer
A Practical Stride Prefetching Implementation in Global Optimizer Hucheng Zhou, Xing Zhou Tsinghua University 9/25/2020 1
Outline v v v v Introduction Motivation Algorithm Phase Ordering Prefetching Scheduling Experiments Future Work 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 2
Introduction v What’s data prefetching v v Brings data into cache ahead of its use Compiler controlled prefetching v v Prefetching candidates identification Prefetching timing determination Unnecessary prefetching elimination Other prefetching tuning 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 3
Introduction v Stride data prefetching v v v Massive consecutive memory references Cause to many cache misses, thus poor performance Our focus v Compiler based stride data prefetching 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 4
Motivation v Dominant stride prefetching algorithm v v Loop Nest Optimizer (LNO) based LNO based algorithm v Locality analysis v v v (reuse analysis localized iteration space prefetching predicates) Loop splitting (loop peeling and unrolling) Scheduling prefetches (iterations ahead of use) Limitations of LNO based approach Observations 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 5
LNO based algorithm v Example: 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 6
Limitations v Only effective for affine array reference v v Only handle with DO loop nest Due to the vector space model Just focus on numerical applications operate on dense matrices However, not all of the strided references are affine array references, such as c++ STL vector traversing and other wrap around data structures 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 7
Necessity v Four common ways of STL vector traversing 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 8
The Component flow of Open 64 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 9
IR after PRE-OPT v For ACCESS 1 and ACCESS 2 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 10
Compare with array references 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 11
Comparison v v v LNO based approach exploits the tight affinity with locality analysis and vector space model to identify the prefetching candidates which suffer from cache misses However, this affinity limits itself only for affine array references, cannot handle STL style stride references From another angle, identify stride prefetching candidates as induction variable recognition, then exploit the phase ordering to avoid unnecessary prefetches 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 12
Definitions and Observations A linear inductive variable (expression) is an expression whose value is incremented by a nonzero integer loop invariant on every iterations Lemma 1: linear inductive expression can be recursively defined: v v v v If v is a linear induction variable with stride s, then i is a linear inductive expression with the same stride s; If expr is a linear inductive expression with stride s, then –expr is a linear inductive expression with the same stride -s; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr + invar and invar + expr are all inductive expressions with stride s; If expr 1 and expr 2 are linear inductive expressions with stride s 1 and s 2 respectively, then expr 1 + expr 2 is a linear inductive expression with stride s 1 + s 2; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr * invar and invar * expr are all inductive expressions with stride invar * s; If expr is linear inductive expression with stride s and invar is a loop invariant, then expr / invar is a linear inductive expression with stride s/invar. 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 13
Definitions and Observations v So, Mathematically, it equals to the linear combination of linear induction variables and loop invariants, with the form: v v v E = c 1* i 1 + c 2*i 2 + … + cn*in + invar, where stride value is A stride reference is the reference in a loop whose accessed memory address is incremented by a integer loop invariant on every iterations Lemma 2: If a reference in loop whose accessed memory address is represented as an inductive expression, then it is a stride reference 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 14
Speculative Induction Variable Recognition for Stride Prefetching v Thus stride reference identification equals to induction expression recognition v We have presented an algorithm for demand driven speculative recognition of induction expression 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 15
Speculative Induction Variable Recognition for Stride Prefetching v Induction variables in SSA form must satisfy the following condition : v v v there must be a live phi in the corresponding loop header BB among the two operands of the phi, the loop invariant operand must point to the initialization of the induction variable out of the loop, while the other operand must be defined within the loop body. We call them init and increment respectively After expanding the increment operand of phi by copy propagation, the expanding result must contain the result of that phi, with a loop invariant expression as stride of the induction variable 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 16
Our algorithm 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 17
9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 18
Comparison v Traditional induction variable recognition v v v Equals to strongly connected component Just for variable Conservative due to alias Limitations of copy propagation Our algorithm v v Demand driven Symbolic interpretation Speculative determination Modify a few on the expansion process of current implementation 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 19
Phase Ordering v Implement our algorithm after SSAPRE will benefit from strength reduction and PRE optimizations 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 20
Prefetching Scheduling v v Leading reference determination Prefetching information collection v v Prefetching determination for the candidates v v Based on the heuristics, such as data and loop size as well as the number of prefetches in the loop Computation of prefetching distance v v Stride value, data/loop shape, target cache model division of memory latency and the estimated time per iteration Loop transformations based on locality information to further reduce the number of prefetches 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 21
Experiments v We have conducted experiments against SPEC 2006 benchmark on IA 64 v v v Itanium 2 Madison 1. 6 GHz with 6 MB L 3 cache and 8 GBytes memory quad-processor server with Redhat. Linux Advanced Server 4. 0 compiler is Open 64 4. 1 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 22
Normalized results of SPEC 2006 FP 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 23
Normalized results of SPEC 2006 INT 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 24
Conclusion and Future Work v v v we propose an alternative inductive data prefetching algorithm implemented in global optimizer at O 2 level, which can in theory prefetch almost all of the stride references statically determined in compile time extend to prefetch periodic, polynomial, geometric, monotonic and wrap-around variables Totally integrated stride prefetching algorithm with strength reduction optimization in SSAPRE coordinate the data prefetch with data layout optimization further investigate the interaction between software and hardware prefetching according to the static compiler analysis and feedback information on X 86 platform 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 25
Thanks v Thank you very much v And any questions? 9/25/2020 A Practical Stride Prefetching Implementation in Global Optimizer 26
- Slides: 26