CPI Pipeline CPI Ideal pipeline CPI register renaming

  • Slides: 53
Download presentation

Τεχνικές βελτίωσης του CPI Pipeline CPI = υπερβαθμωτή εκτέλεση Ideal pipeline CPI + register

Τεχνικές βελτίωσης του CPI Pipeline CPI = υπερβαθμωτή εκτέλεση Ideal pipeline CPI + register renaming δυναμική εκτέλεση loop unrolling static scheduling, software pipelining cslab@ntua 2019 -2020 προώθηση Structural Stalls + Data Hazard Stalls + Control Stalls υποθετική εκτέλεση delayed branches, branch πρόβλεψη scheduling διακλαδώσεων 2

How to use Static Branch Prediction? • In MIPS move instructions from taken/not-taken path

How to use Static Branch Prediction? • In MIPS move instructions from taken/not-taken path into branch delay slot • MIPS “branch-likely” instruction: – Delay slot is executed only if branch is taken, convert to NOP otherwise – Encode predict-taken into instruction: static prediction! – Valid to move instructions from the taken path into delay slot even if they clobber (overwrite) live register values. If not taken, they do not exist! – beq => NT, beql => T • ARM encodes similar information into instructions (using condition codes) cslab@ntua 2019 -2020 6

Dynamic Branch Prediction • Why does prediction work? – Underlying algorithm has regularities –

Dynamic Branch Prediction • Why does prediction work? – Underlying algorithm has regularities – Data that is being operated on has regularities – Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems • Is dynamic branch prediction better than static branch prediction? – Seems to be – There a small number of important branches in programs which have dynamic behavior – Loops, loops… cslab@ntua 2019 -2020 7

Dynamic Branch Prediction • Performance = ƒ(accuracy, cost of misprediction) • Branch History Table

Dynamic Branch Prediction • Performance = ƒ(accuracy, cost of misprediction) • Branch History Table (BHT): use lower bits of PC address to index a table of 1 -bit values – Record whether or not branch taken last time – No address check (i. e. not a cache, why? ) cslab@ntua 2019 -2020 8

Αφαιρετικά cslab@ntua 2019 -2020 10

Αφαιρετικά cslab@ntua 2019 -2020 10

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0 Απόφαση (108): T cslab@ntua 2019 -2020 11

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0 TTTT. . . . TTTΤ 100000 Απόφαση (108): T TTTT. . . . TTTN cslab@ntua 2019 -2020 12

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0 TTTT. . . . TTTΤ N 100000 Απόφαση (108): T TTTT. . . . TTTN T cslab@ntua 2019 -2020 13

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); Misprediction = 2/100000 0 x 150: if( (i & 1) == 1) call. B( ); Prediction Rate = 99. 998% } Πρόβλεψη (108): 0 TTTT. . . . TTTΤ NTTTT. . . . TTTΤ 100000 Απόφαση (108): T TTTTTTTT. . . . TTTN cslab@ntua 2019 -2020 14

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); DIV MFHI BNEZ JMP R 2, #100 R 1, 0 x 150 FUNA } Πρόβλεψη (144): 0 Απόφαση (144): N cslab@ntua 2019 -2020 15

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); DIV MFHI BNEZ JMP R 2, #100 R 1, 0 x 150 FUNA } Πρόβλεψη (144): 0 NTTTTTTT. . . . TTTΤ 100 Απόφαση (144): NTTTT. . . . TTTN cslab@ntua 2019 -2020 16

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); DIV MFHI BNEZ JMP R 2, #100 R 1, 0 x 150 FUNA } Πρόβλεψη (144): 0 NTTTTTTT. . . . TTTΤ N 100 Απόφαση (144): NTTTT. . . . TTTN T cslab@ntua 2019 -2020 17

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); Misprediction = 2/100 0 x 150: if( (i & 1) == 1) call. B( ); Prediction Rate = 98% } Πρόβλεψη (144): 0 NTTTTTTT. . . . TTTΤ NTTTT. . . . TTTΤ 100 Απόφαση (144): NTTTT. . . . TTTN T TTTT. . . . TTTN cslab@ntua 2019 -2020 18

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: 0 x 150: if( ( i % 100) == 0 ) call. A( ); if( (i & 1) == 1) call. B( ); } AND SUB BNEZ JMP R 1, R 2, #1 R 1, ENDLOOP FUNB Πρόβλεψη (150): 0 Απόφαση (150): T cslab@ntua 2019 -2020 19

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: 0 x 150: if( ( i % 100) == 0 ) call. A( ); if( (i & 1) == 1) call. B( ); } AND SUB BNEZ JMP R 1, R 2, #1 R 1, ENDLOOP FUNB Πρόβλεψη (150): 0 T Απόφαση (150): T N cslab@ntua 2019 -2020 20

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: 0 x 150: if( ( i % 100) == 0 ) call. A( ); if( (i & 1) == 1) call. B( ); } AND SUB BNEZ JMP R 1, R 2, #1 R 1, ENDLOOP FUNB Πρόβλεψη (150): 0 T N Απόφαση (150): T N T cslab@ntua 2019 -2020 21

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 1 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); Misprediction = 1/1 0 x 150: if( (i & 1) == 1) call. B( ); Prediction Rate = 0% } Πρόβλεψη (150): 0 T N T N T N Απόφαση (150): T N T N T N T cslab@ntua 2019 -2020 22

Δυναμικές Τεχνικές Πρόβλεψης • 2 -bit predictor Predict Taken Predict Not Taken T NT

Δυναμικές Τεχνικές Πρόβλεψης • 2 -bit predictor Predict Taken Predict Not Taken T NT S-T 11 S-NT 00 T NT T Predict Taken W-T 10 NT T W-NT 01 Predict Not Taken NT • Red: stop, not taken • Green: go, taken • Adds hysteresis to decision making process cslab@ntua 2019 -2020 23

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0, 1: Predict Not Taken 2, 3: Predict Taken 1 Απόφαση (108): T cslab@ntua 2019 -2020 24

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0, 1: Predict Not Taken 2, 3: Predict Taken 1 23333333. . . . 3333 100000 Απόφαση (108): T TTTT. . . . TTTN cslab@ntua 2019 -2020 25

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. .

Παράδειγμα 2 -bit predictor 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0, 1: Predict Not Taken 2, 3: Predict Taken 1 23333333. . . . 3333 2 100000 Απόφαση (108): T TTTT. . . . TTTN T cslab@ntua 2019 -2020 26

Παράδειγμα 2 -bit predictor Misprediction ~= 1 per N branches 0 x 108: for(i=0;

Παράδειγμα 2 -bit predictor Misprediction ~= 1 per N branches 0 x 108: for(i=0; i < 100000; i++) {. . . 0 x 144: if( ( i % 100) == 0 ) call. A( ); 0 x 150: if( (i & 1) == 1) call. B( ); } Πρόβλεψη (108): 0 x 108 Prediction Rate = 99. 999% 0 x 144 Prediction Rate = 99% 0 x 150 Prediction Rate = 50% 0, 1: Predict Not Taken 2, 3: Predict Taken 1 23333333. . . . 3333 2 3333. . . . 3333 100000 Απόφαση (108): T TTTTTTTT. . . . TTTN cslab@ntua 2019 -2020 27

Δυναμικές Τεχνικές Πρόβλεψης • Άλλος 2 -bit predictor T Predict Taken S-T 11 T

Δυναμικές Τεχνικές Πρόβλεψης • Άλλος 2 -bit predictor T Predict Taken S-T 11 T Predict Not Taken W-NT 00 NT T Predict Taken W-T 10 NT S-NT 01 Predict Not Taken NT • 220 δυνατά FSMs → 5248 “ενδιαφέροντα” [Nair, 1992] cslab@ntua 2019 -2020 28

Ακρίβεια Πρόβλεψης για 2 -bits predictor Χάλια! 18% 16% 4096 Entries 2 -bit BHT

Ακρίβεια Πρόβλεψης για 2 -bits predictor Χάλια! 18% 16% 4096 Entries 2 -bit BHT Χμμ. . . ΟΚ 14% 12% 10% 8% 6% 4% SUPER! 2% cslab@ntua 2019 -2020 li eqntott expresso gcc fpppp spice doducd tomcatv 0% matrix 300 0% nasa 7 Frequency of Mispredictions 20% 29

Πόσο επηρεάζει το μέγεθος του πίνακα; 4 KEntry 2 -bit BHT ~= Unlimited Entry

Πόσο επηρεάζει το μέγεθος του πίνακα; 4 KEntry 2 -bit BHT ~= Unlimited Entry 2 -bit BHT 4096 Entries 2 -bit BHT Unlimited Entries 2 -bit BHT 18% 16% 14% 12% 10% 8% 6% 4% 2% 4, 096 entries: 2 -bits per entry cslab@ntua 2019 -2020 li eqntott expresso gcc fpppp spice doducd tomcatv 0% matrix 300 0% nasa 7 Frequency of Mispredictions 20% Unlimited entries: 2 -bits/entry 30

Παράδειγμα if (aa==2) aa = 0; if (bb == 2) bb = 0; if

Παράδειγμα if (aa==2) aa = 0; if (bb == 2) bb = 0; if (aa != bb) {. . . } b 1 1 (T) 0 (NT) b 2 1 b 3 aa bb 0 1 0 b 3 b 3 aa bb BNEZ R 3, L 1 ; branch b 1 (aa!=2) DADD R 1, R 0 ; aa=0 L 1: DADDIU R 3, R 2, #-2 BNEZ R 3, L 2 ; branch b 2 (bb!=2) DADD R 2, R 0 ; bb=0 R 3, R 1, R 2 ; R 3=aa-bb R 3, L 3 ; branch b 3 (aa==bb) L 2: DSUBU b 2 Path: 1 -1 1 -0 0 -1 DADDIU R 3, R 1, #-2 BEQZ 0 -0 aa bb Αν b 1 και b 2 NT (Not Taken) τότε b 3 T (Taken) ! cslab@ntua 2019 -2020 32

Global-History Two-Level Predictor • (2, 2) predictor • 64 entries • 4 low order

Global-History Two-Level Predictor • (2, 2) predictor • 64 entries • 4 low order bits PC • 2 bits global history cslab@ntua 2019 -2020 34

Σύγκριση Με ίσο συνολικό μέγεθος πινάκων 4096 Entries 2 -bit BHT Unlimited Entries 2

Σύγκριση Με ίσο συνολικό μέγεθος πινάκων 4096 Entries 2 -bit BHT Unlimited Entries 2 -bit BHT 1024 Entries GA(2, 2) BHT 18% 16% 14% 12% 11% 10% 8% 6% 6% 5% 6% 6% 4% cslab@ntua 2019 -2020 Unlimited entries: 2 -bits/entry li eqntott expresso gcc fpppp matrix 300 4, 096 entries: 2 -bits per entry spice 1% 0% doducd 1% tomcatv 2% 0% 5% 4% nasa 7 Frequency of Mispredictions 20% 1, 024 entries (2, 2) 35

Local-History Two-Level Predictor • BHT – 8 εγγραφές (shift registers) – 3 -bit ιστορία

Local-History Two-Level Predictor • BHT – 8 εγγραφές (shift registers) – 3 -bit ιστορία • PHT – 128 εγγραφές – 2 -bit predictors Shift registers cslab@ntua 2019 -2020 37

Local-History Two-Level Predictor Branch address • Η αλλιώς: 2 -bits per branch predictor Prediction

Local-History Two-Level Predictor Branch address • Η αλλιώς: 2 -bits per branch predictor Prediction 2 cslab@ntua 2019 -2020 Multiple, per-branch history shift regs 38

Tournament Hybrid Predictor Branch PC Meta. Predictor Ουσιαστικά ίδιο με ένα 2 -bit BHT

Tournament Hybrid Predictor Branch PC Meta. Predictor Ουσιαστικά ίδιο με ένα 2 -bit BHT cslab@ntua 2019 -2020 Pred 1 Final Prediction 41

Παράδειγμα: Alpha 21264 • Meta-predictor – 4 K εγγραφές – κάθε εγγραφή είναι ένας

Παράδειγμα: Alpha 21264 • Meta-predictor – 4 K εγγραφές – κάθε εγγραφή είναι ένας 2 -bit predictor – προσπέλαση με βάση το PC της εντολής διακλάδωσης • Pred 0 : Local-history two-level predictor – BHT: 1 K 10 -bit εγγραφές – PHT: 1 K 3 -bit predictors • Pred 1 : Global-history two-level predictor – PHT: 4 K 2 -bit predictors • Σύνολο : 29Κ bits • SPECfp 95 : misprediction = 1 / 1000 instructions • SPECint 95 : misprediction = 11. 5/1000 instructions cslab@ntua 2019 -2020 42

Παράδειγμα: Alpha 21264 n Choice predictor indexed by history of last 12 branches n

Παράδειγμα: Alpha 21264 n Choice predictor indexed by history of last 12 branches n No PC bits used (!!!) cslab@ntua 2019 -2020 43

Branch-Target Buffer cslab@ntua 2019 -2020 48

Branch-Target Buffer cslab@ntua 2019 -2020 48

Χρήση ΒΤΒ cslab@ntua 2019 -2020 49

Χρήση ΒΤΒ cslab@ntua 2019 -2020 49

Return Address Stack (RAS) cslab@ntua 2019 -2020 52

Return Address Stack (RAS) cslab@ntua 2019 -2020 52