Branch Prediction Strategies Simulation and Comparison Xiaozheng He

Branch Prediction Strategies Simulation and Comparison Xiaozheng He Yi Li Shengchao Huangfu Spring 2016

Outline Part 1 ● Overall Design --UML ● Trace file --Simplify 青衣 Part 2 ● Implementation --Always NT --Bimodal --GShare --Tournament --Perceptron Part 3 ● Comparison

Work • • • We have implemented a trace file parser to simplify a standard trace file. – 7 trace files from SPEC 2000 have been simplified for simulation test. We have implemented five branch predictors – Always. T/Always. NT – Bimodal 1 -bit and 2 -bit – GShare – Tournament – Perceptron 青 We衣 have done couple comparisons between different predictors

Design Trace File 青衣 Trace Parser Simple trace Simulator predict result Data Analysis

Trace File • Standard Trace File 青 Instruction 衣 Address Branch Result

Trace Parser 青衣

Simplified Trace 青衣

Simplified Trace File Original Size Simplified Size gcc-1 K. trace. gz 89 KB 1 KB gcc-10 M. trace. gz 888. 1 MB 13. 3 MB gcc-50 M. trace. gz (The GCC compiler) 4. 44 GB 63. 3 MB hmmer-100 M. trace. gz (Biosequence analysis using profile hidden Markov models) 8. 75 GB 107. 6 MB libquantum-100 M. trace. gz (Simulation of quantum mechanics) 8. 74 GB 148. 7 MB art-100 M. trace. gz (Image recognition using neural networks) 8. 97 GB 129. 4 MB 8. 79 GB 155. 4 MB 青衣 mcf-100 M. trace. gz (Vehicle scheduling)

Gshare Predictor • Two Parameters: -- Predictor Size (Branch History Table) -- History Length • Xor branch address and branch history 青衣

Gshare Predictor 青衣

Gshare Predictor Example 1 • prdictor size: 2^3 • history length: 3 • trace file: gcc-1 K • • 1. Branch history table 2. History shift register 3. Branch address 4. Branch outcome from input trace 5. Prediction made 6. Prediction result (correct/incorrect) 青 7. Running 衣 total of mis-predictions thus far 2 3 4 5 6 7

Prediction Outcome N Branch 48 d 237 Adress: 1001000110111 Branch History: 0 0 0 Branch History Table 0 XOR Result : 1 1 1 青衣 . . . MOD 8 = 7 1 2 3 4 N N N 5 6 7 N N N

Prediction Outcome N Branch 48 d 244 Adress: 1001000110100100100 Branch History: 0 0 N 0 Branch History Table 0 XOR Result : 1 0 0 青衣 . . . MOD 8 = 4 1 2 3 4 N N N 5 6 7 N N N

Prediction Outcome N Branch 48 be 4 b Adress: 1001011111001001011 Branch History: 0 0 N N 0 1 Branch History Table 0 XOR Result : 0 1 1 青衣 . . . MOD 8 = 4 1 2 3 4 N N n N 5 6 7 N N N

Prediction Outcome N Branch 48 bec 6 Adress: 1001011111011000110 Branch History: 0 0 N N N 1 0 Branch History Table 0 XOR Result : 1 1 1 青衣 . . . MOD 8 = 7 1 2 3 4 N N n 5 6 7 N N N

Gshare Comparision • Predictor Size: 2^2 ~ 2^20 • History Length: 0 ~ 19 青衣

Gshare Comparision 青衣

With two predictor size: 2^8 and 2^16 青衣

Gshare Comparision libquantum-100 M 青衣

Comparision with different Predictor Size 青衣 With history length 13

Comparision with different trace files 青衣 With history length 13 and predictor size: 2^16

Creating Patterns • Branch adress • (NT)* pattern 青衣

Creating Patterns • Predictor Size: 2^3 Branch History Table 0 青衣 1 2 3 4 N N N 5 6 7 N N N

Tournament • {Aways. T, Aways. NT, Bimodal 1 -bit, Bimodal 2 -bit, Gshare, Perceptron} • We choose Bimodal 2 -bit and Gshare as two competed predictors in the tournament predictor. 青衣

Tournament hash(address ) Address 青衣 Bimodal Predict Result GShare Choose Table

Tournament 青衣

Tournament ● ● Bimodal 2 -bit with 2^20 Bytes table size GShare with 2^16 Bytes table size and 13 bits global register 青衣

Perceptron 青衣

Low Accuracy Problem Training Trace File Training 青衣 Trace File Updated Perceptron Table Prediction Accuracy Below 50% no matter what size will be set to the perceptron table or the global register Predict Result

Perceptron Process 1. the branch address is hashed to produce an index i belongs 0. . . N-1 into the table of perceptrons. 2. the ith perceptron is fetched from the table into a vector register P 0. . . n of weights. 3. the value of y is computed as the dot product of P and the global history register. 4. the branch is predicted not taken when y is negative, or taken otherwise. 5. once the actual outcome of the branch becomes known, the training algorithm uses this outcome and the value of y to update the weights in P. 6. P is 青 written back to the ith entry in the table. 衣

Perceptron address Predict Result Actual Result w 0’, wn’ w 0, w 1’, w 1, w 2’, . , w 2, . . . , wn Prediction Predict Result 青衣 Perceptron Table Training

Perceptron 青衣

Perceptron 940 Rows, 62 Register Size 青衣

Put All Together • • • Always. NT Bimodal 2 -bit – 2^20 Bytes table size. GShare – 2^16 Bytes table size – 13 bits global register size Tournament – 2^10 Bytes Choose Table – Bimodal 2 -bit – GShare Perceptron – 青 940 Rows – 衣 62 Global register size • Benchmark ‒ gcc-1 K-simple. trace ‒ gcc-10 M-simple. trace ‒ gcc-50 M-simple. trace ‒ hmmer-100 M-simple. trace ‒ libquantum-100 M-simple. trace ‒ art-100 M-simple. trace ‒ mcf-100 M-simple. trace

Comparison 青衣

Conclusion • The size of trace file has dramatically influence on the prediction accuracy. – When size less than 1 K, no predictor has good accuracy. • Normally, the predictor accuracy increases as the increasing of the history table size and the global register length. • Gshare achieves a stable performance when table size is 2^16 Bytes, and global register length is 13 bits. • The size of choose table doesn’t has much influence on the performance of Tournament predictor. • Perceptron has the best prediction accuracy for the most cases. 青衣

Thank You Question?