FAT predictor Sabareesh Ganapathy Prasanna Venkatesh Srinivasan Maribel
FAT predictor Sabareesh Ganapathy, Prasanna Venkatesh Srinivasan, Maribel Monica
What is FAT? : ) • FAT is a Frequency-Analysis based branch predictor, integrated with TAGE. • Frequency analysis involves studying the frequency transformation characteristics of a branch to predict the branch outcome as Taken/ Not taken. • Historical context : Frequency analysis using Fourier Transform has been explored in FAB[1] by static profiling of the branch frequency characteristics across different workloads. • FAT is a dynamic branch predictor. [1] M. Kampe, P. Stenstromand M. Dubois, “The FAB predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches”, Proceedings of the Eighth International Symposium on High Performance Computer Architecture (HPCA), 2002. UNIVERSITY OF WISCONSIN-MADISON
Underlying philosophy History repeats itself! December ‘ 15 May ‘ 16 UNIVERSITY OF WISCONSIN-MADISON
Frequency Analysis LHRtable b 0 b 1 Ftable bn FABentry 0 FABe 1 FABe 2 FABe 3 pc pc ghist hash TAG 1 IFFT TAG 2 . . IFFT Time Count TH Local history table 256 entries, 128 bits for each Frequency table 256 -entry 4 -way set associative PC-based address PC and Global history-based address Confid UNIVERSITY OF WISCONSIN-MADISON
TAGE predictor T 12 …… ……… • • • TAGE: 1 bimodal predictor and 12 tables of TAGE used. Minimum history length = 4 and maximum global history=640 Folded history and PC are hashed to compute the index, TAG for each TAGE entry UNIVERSITY OF WISCONSIN-MADISON
FAB + TAGE algorithm Yes No UNIVERSITY OF WISCONSIN-MADISON
Update FAB Yes Time_count==3 Calculate FFT from LHR No Increment time_count Remove DC component Update threshold Normalize FFT Filter Top Nf freq comp Compute IFFT from filtered array and store • Compute TH and store • Threshold = <Scaling factor>*Freq_sum / (Number of frequency components Nf), where Freq_sum = sum of absolute values of filtered array. UNIVERSITY OF WISCONSIN-MADISON
Infrastructure • CBP-2016 infrastructure was used. Branch traces for server and mobile benchmarks were provided. • The main program decodes the instructions and passes only conditional branches to the predictor. • The predictor function was written for our custom predictor. • FFTW library was used for computing FFT and DCT transforms in C++ program. UNIVERSITY OF WISCONSIN-MADISON
MATLAB Analysis • • Perl script was written to scan the edge sequence in trace files. Dependencies between edges was determined and the local history was found for all branches allocated on FAB table. Local history determined using script was used in MATLAB. FAB predictor was modelled in MATLAB and analysis was performed to determine threshold and number of frequency components required for correct prediction. UNIVERSITY OF WISCONSIN-MADISON
Regression analysis in CBP infrastructure • • • Parameters such as local history register bits, # frequency components were varied to observe the effect on misprediction rate. FAB predictor was modified to use Discrete Cosine Transform instead of Discrete Fourier Transform. DC component was used in prediction when local history register was all 1’s. UNIVERSITY OF WISCONSIN-MADISON
SHORT_MOBILE-29 SHORT_MOBILE-30 Misprediction rate misprediction rate 4, 811 4, 809 4, 807 4, 805 4, 803 4, 801 4, 8054, 80494, 805 1 2 3 4 4, 805 4, 8044, 804 5 6 technique 7 8 9 10 0, 36 0, 355 0, 356 0, 353 0, 3492 0, 345 0, 347 0, 3430, 3431 0, 343 0, 3350, 335 0, 33 0, 325 0, 32 1 2 3 4 5 6 7 8 9 10 Technique LONG_SERVER-1 0, 65 0, 638 0, 61 0, 62120, 6201 0, 621 0, 622 0, 623 0, 59 0, 57 0, 568 0, 55 0, 53 1 2 3 4 5 6 7 8 9 10 technique Misprediction rate misprediction rate SHORT_SERVER-102 1, 085 1, 08 1, 075 1, 07 1, 065 1, 064 1, 063 1, 0641, 0638 1, 0631, 063 1, 064 1, 055 1, 05 1 2 3 4 5 6 7 8 9 10 Technique X axis label 1 2 3 4 5 6 7 8 9 10 Technique FFT DCT – No opt DCT – No opt DCT - opt Time count 4 64 4 4 0 8 32 64 4 4 LHR length 512 512 256 128 LHR table entries 2^14 2^20 2^14 2^14 2^10 UNIVERSITY OF WISCONSIN-MADISON
# Frequency Components Number of frequency components vs Misprediction rate 6 Misprediction rate 5 4, 81 4, 804 4, 806 4, 802 4 3 2 1, 08 1 0 0, 348 1, 064 0, 638 5 frequency components 0, 335 0, 623 10 frequency components SHORT_MOBILE-29 SHORT_MOBILE-30 1, 07 1, 064 0, 335 TRACE 0, 622 0, 338 15 frequency components SHORT_SERVER-102 0, 625 30 frequency components LONG_SERVER-1 UNIVERSITY OF WISCONSIN-MADISON
HW Budget and Implementation FAB ENTRY TAG 1 TAG 2 IFFT/IDCT Time Count Configuration Threshold Confidence Storage 1 Bimodal + 12 TAGE tables 250 Kbits 1 Bimodal + 11 TAGE + 1 LHR table + 1 FAB table 370 Kbits Implementation of frequency transform(DCT/FFT) in HW is complex. Stochastic implementation of DCT was explored. UNIVERSITY OF WISCONSIN-MADISON
Stochastic Logic a = 6/8 1, 1, 0, 1, 1, 1 c = 3/8 1, 1, 0, 0, 1, 0 b = 4/8 • A real value x(0 -1) is represented by sequence of random bits. • Simple logic and fault tolerant characteristics. • Applicable to frequency transforms and image processing. Slide content derived from Mark Reidel's circuits course in UMinn UNIVERSITY OF WISCONSIN-MADISON
Stochastic DCT • Xc(k) = (1/N) Σ x(n)cos(k 2πn/N) , k=0. . . N-1. Cos(x) Angle Mapper (0 -pi/4) SNG Cos(2 x) Cos(4 x) Multiplier , Adder DCT Branch History Cos(8 x) • Steps were taken for finding top frequency components, thresholding and IDCT. • Results comparable to DCT using fftw library. (result degrades by 5%). UNIVERSITY OF WISCONSIN-MADISON
FTAGE Using filtered local history as tag and index for a Pattern history table. FILTER Local History FFT Choose top 10 FTABLE IFFT+Thres holding FTAGE-TABLE TAG 128 bit filtered history Filtered History CTR Confid History folding PC Used for prediction UNIVERSITY OF WISCONSIN-MADISON
Results Short traces Long traces 4, 4 2, 58 4, 35 2, 57 Misprediction rate 4, 25 4, 2228 4, 15 4, 1001 4, 05 4, 1205 2, 572 2, 5698 2, 56 2, 55 2, 5474 2, 5308 2, 52 4 3, 95 misprediction rate 4, 3221 4, 3 Pure TAGE + FFT TAGE + DCT FTAGE 2, 51 Pure TAGE + FFT TAGE + DCT FTAGE The best way to predict the future is to invent it – Alan Kay. UNIVERSITY OF WISCONSIN-MADISON
FTAGE-Future Work An IIR filter can be used for filtering. The top ten frequency components were measured for a number of traces. Digital frequencies present over various traces 0 0, 2 0, 4 0, 6 0, 8 1 UNIVERSITY OF WISCONSIN-MADISON
References [1] M. Kampe, P. Stenstromand M. Dubois, “The FAB predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches”, Proceedings of the Eighth International Symposium on High Performance Computer Architecture (HPCA), 2002. [2] A. Seznecand P. Michaud, “A case for (partially) Tagged Geometric history length branch prediction”, Journal of Instruction Level Parallelism, Feb. 2006 [3]Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja, “An Architecture for Fault-Tolerant Computationwith Stochastic Logic”, IEEE Transactions on Computers, Vol 60, pp 93 -105 [4]Xiaowei Qin, Shenglong Shang, Adong Fan, “Low-complexity FPGA Implementation of Sine/Cosine. Generator Based on Stochastic Computation”. UNIVERSITY OF WISCONSIN-MADISON
UNIVERSITY OF WISCONSIN-MADISON
- Slides: 20