Characterization and Evaluation of Hardware Loop Unrolling Marcos
- Slides: 18
Characterization and Evaluation of Hardware Loop Unrolling Marcos R. de Alba and David R. Kaeli BARC 2003 Cambridge, MA January 30, 2003
Motivation • High temporal locality available in loops suggests applying more aggressive fetch techniques to provide a larger number of instructions for dispatch and issue • Current aggressive fetch techniques (e. g. , trace caches) are not tuned to exploit loop behavior • We propose a mechanism specifically tailored to fetching loop bodies DE ALBA, KAELI BARC 2003 2
Outline • • Introduction Loop characteristics Loop prediction hardware Loop caching and unrolling hardware Experimental approach Results Conclusions and current work DE ALBA, KAELI BARC 2003 3
Introduction • To exploit instruction level parallelism, it is essential to have a large window of candidate instructions available to issue from • The temporal locality present in loops provides a good opportunity for loop caching • In general-purpose applications, 50% of the loops have variable-dependent trip counts and/or contain conditional branches in their bodies • These characteristics suggest that a hardware-based loop caching approach should be investigated DE ALBA, KAELI BARC 2003 4
Loop characteristics • • • internal control flow number of loop visits number of iterations per loop visit dynamic loop body size patterns leading up to the loop visit DE ALBA, KAELI BARC 2003 5
Loop prediction hardware • A path-to-loop register to detect loops in advance • A stack to maintain nested per-iteration loop information • A table to maintain per-visit loop information and update loop prediction state • Path-in-iteration table to maintain history of branches visited within individual iterations DE ALBA, KAELI BARC 2003 6
Loop characteristics and hardware components * Based on a study of SPECint 2000, Mi. Bench and Media. Bench DE ALBA, KAELI BARC 2003 7
Loop stack Head address 0 x 2 d 24 Path-in-loop table Tail address 0 x 2 d 68 path-to-loop *pilt 101 . . Loop prediction table Head address 0 x 2 d 24 Tail address 0 x 2 d 68. . path-in-loop itns next 001 2 2 000 1__ 2 0 3 0 Path-in-loop prediction table path-to-loop *pilt 101 DE ALBA, KAELI Predicted path-in-loop 001 000 1__ BARC 2003 pred conf next itns ctr 2 3 2 2 3 3 0 8
Loop caching and unrolling hardware • A loop cache to hold instructions that belong to loop bodies • A loop cache control mechanism for indexing into the loop cache and for maintaining loop cache state (number of allocated loops, their indices and offsets) DE ALBA, KAELI BARC 2003 9
Path-to-loop bn-1 bn-2 bn-3 Gshare b 0 address Loop prediction table. . . b 1 tag head tail 50 2 d 68 2 d 24 Path-in-loop table preditns *pilt 4 001 2 2 000 2 3 1__ 0 0 index N tag match last branch ? N There is no information for this loop, proceed with normal fetching Y preditns > 1 ? Y The information is used by the loop cache control to interrogate the loop cache for a hit or to build dynamic traces in the case of a miss
Loop cache control mechanism Loop cache index 2 d 24 tag 50 instructions . . . . 2 d 68 from loop prediction table N match? store loop pattern in the loop cache Y . . . issue instructions from loop cache DE ALBA, KAELI BARC 2003 11
Loop cache control Loop cache (2 d 24, 2 d 68, 4, 001, 000) Loop body 2 d 24: ldl t 1, 16(sp) 2 d 28: lda t 1, -31(t 1) 2 d 2 c: bge t 1, 2 d 6 c 2 d 30: ldl v 0, 16(sp) 2 d 34: lda v 0, -15(v 0) 2 d 38: bge v 0, 2 d 50 2 d 3 c: ldl t 2, 0(sp) 2 d 40: ldl t 0, 32(sp) 2 d 44: subl t 0, t 2, t 0 2 d 48: br zero, 2 d 5 c 2 d 4 c: ldl v 0, 32(sp) 2 d 50: subl v 0, 0 x 1, v 0 2 d 54: stl v 0, 32(sp) 2 d 58: ldl t 0, 16(sp) 2 d 5 c: addl t 0, 0 x 1, t 0 2 d 60: stl t 0, 16(sp) 2 d 68: br zero, 2 d 24 Unrolled loop according to information from loop predictor (assumed 4 instructions/line) 12
Experimental approach • Modified Simplescalar 3. 0 c Alpha EV 6 pipeline to model the following features: – – – – loop detection loop prediction loop cache filling loop cache/I-cache multiplexing loop termination detection loop stack operations loop table operations DE ALBA, KAELI BARC 2003 13
Modifications to Simple. Scalar Loop predictor Fetch Dispatch Register scheduler Memory scheduler Exec Mem Write back Commit Loop cache I-Cache ITLB (IL 1) D-Cache DTLB (DL 1) I-Cache (IL 2) D-Cache (DL 2) Virtual memory 14
Frequency of loop iterations 15
Frequency of dynamic loop body size 16
path-to-loop: prediction rate of entering the loop using the path-to-loop iterations: prediction rate of number of iterations per entered loop visit path-in-itn: prediction rate of paths-in-iteration per loop iteration speedup: relative CPI gain compared to no loop prediction DE ALBA, KAELI BARC 2003 17
Conclusions and current work • Above 50 % of loops have properties that make them highly predictable and attractive for aggressive fetching • Compare efficiency of loop cache against trace cache • Propose a hybrid fetch approach utilizing the loop cache for loop bodies and the trace cache for all non-in-loop instructions DE ALBA, KAELI BARC 2003 18
- Deep unrolling
- Unrolling the recurrence
- Internal and external hardware
- Definition of direct and indirect characterization
- Indirect vs direct characterization
- Contoh open loop system
- Fifth gear loop the loop
- Open loop vs closed loop in cars
- Manakah yang lebih baik open loop atau close loop system
- Statement while... wend digunakan untuk
- Fingerprint basics
- Multi loop pid controller regolatore pid multi loop
- Progress and performance measurement and evaluation
- Pregnancy and infant cohort monitoring and evaluation
- Progress and performance measurement and evaluation
- Hardware and control structures
- Hardware acquisition in system analysis and design
- Osma sas
- Software and its types