DATA ADDRESS PREDICTION Zohair Hyder Armando SolarLezama CS

  • Slides: 17
Download presentation
DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS 252 – Fall 2003

DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS 252 – Fall 2003

Motivation • Large and increasing gap between CPU and memory speeds • Miss penalty

Motivation • Large and increasing gap between CPU and memory speeds • Miss penalty on today’s processors over 600 cycles • Load latency is bottleneck on performance Solution: Prefetch • Static: Compilers may insert prefetch instructions. Limited because of lack of run-time information • Dynamic: High adaptability

Metrics • Coverage: Fraction of DL 1 misses that hit in prefetch buffer –

Metrics • Coverage: Fraction of DL 1 misses that hit in prefetch buffer – Higher implies lower load latency • Accuracy: Fraction of prefetches that are actually used by CPU – Higher implies less memory bandwidth needed • Tradeoffs between coverage and accuracy – For given memory bandwidth, coverage is probably more important

Architecture • Prefetch buffer acts as Level 1. 5 cache • Hit time of

Architecture • Prefetch buffer acts as Level 1. 5 cache • Hit time of prefetch buffer is same as DL 1 because of small size and same associativity • Demand fetches always get priority over prefetches • Predictor uses DL 1 miss information to determine prefetches

Previous Approaches • Stream buffers – Introduced by Jouppi in 1990 – Kessler and

Previous Approaches • Stream buffers – Introduced by Jouppi in 1990 – Kessler and Palacharla augmented them in 1994 to allow filtering and prefetching for non-unit strides • Reference Prediction Table – Introduced by Baer and Chen in 1992 to detect arbitrary strides • Markov Predictor: – Introduced by Joseph and Grunwald in 1999

Reference Prediction Table • RPT indexed by PC of load instruction • RPT holds

Reference Prediction Table • RPT indexed by PC of load instruction • RPT holds last effective address, and offset with second to last effective address • If current effective address results in same offset, then prefetch

Markov Predictor • Index by current address: table holds 4 possible next addresses •

Markov Predictor • Index by current address: table holds 4 possible next addresses • Issue all 4 into prefetch request queue • If queue is full, replace an element with lower priority – LRU prioritization: more recently used has higher priority

Strides • Consider the following statements in a loop: n += k; u +=

Strides • Consider the following statements in a loop: n += k; u += x[n]; v += y[n]; where k is larger than the block size. The miss address stream will be: A, B, A+k, B+k, A+2 k, B+2 k • Stream buffers perform poorly in interleaved access streams • RPT works great. • Markov predictor is incapable of detecting ANY strides.

Our Contributions • Difference markov predictor: – Use similar markov implementation – Predict differences

Our Contributions • Difference markov predictor: – Use similar markov implementation – Predict differences rather than addresses – Input to predictor is current difference, output is predicted difference • Bayes predictor: – Use 3 inputs: current difference, current PC, and current address – Output is predicted difference

Difference Markov Predictor • Use difference coding • Index by current difference = current

Difference Markov Predictor • Use difference coding • Index by current difference = current address – last address • Predict next difference

Difference Markov - Advantages • • Works well with small table size Detects strides,

Difference Markov - Advantages • • Works well with small table size Detects strides, even in interleaved access streams More compact than RPT, e. g. stride of 1 needs a single entry Performs especially well on floating point applications that are strideintensive • The Joseph-Grunwald markov predictor is incapable of predicting any address it has not yet seen • Performs only slightly worse than Joseph-Grunwald markov on integer applications: difference correlation information can contain address correlation information too

Bayes Predictor • Predicts based on current PC, current address and current difference •

Bayes Predictor • Predicts based on current PC, current address and current difference • Use Naïve Bayes method to combine information from all 3 • Predict next difference

Bayes Predictor - Details • Idea: – For every possible Δn+1, calculate P(Δn+1 |

Bayes Predictor - Details • Idea: – For every possible Δn+1, calculate P(Δn+1 | Δn, PC, Addr) – Predict the Δn+1 with highest probability – If missing data, use the conditional probabilities given the data we have. • Implementation – Assume Independence! P(Δn+1 | Δn, PC, Addr)=P(Δn | Δn+1)*P(PC| Δn+1)*P(Addr | Δn+1)*P(Δn+1) P(Δn , PC, Addr) – Keep a limited number of the Ps in a table. – Integer representation

Bayes - Advantages • Works well for small table size • Performs well on

Bayes - Advantages • Works well for small table size • Performs well on both Floating Point and Integer applications • Detects most forms of regularity that we have observed in applications • Has good accuracy across applications

Performance For SPEC 2000

Performance For SPEC 2000

Performance With Table Size

Performance With Table Size

Conclusion • Both our predictors have high coverage: for most applications higher than any

Conclusion • Both our predictors have high coverage: for most applications higher than any other predictor • Bayes predictor generally has best accuracy across applications • Difference markov has fairly good accuracy too • Difference markov predictor has great performance even with small tables, and requires very simple hardware • Bayes predictor needs more complex hardware