DivergeMerge Processor DMP Hyesoon Kim Jos A Joao
- Slides: 57
Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group University of Texas at Austin *Microsoft Research
Outline o o o Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion 10/31/2021 2
Predicated Execution (normal branch code) (predicated code) A if (cond) { b = 0; } else { b = 1; } T N C B A B C D D A B C p 1 = (cond) branch p 1, TARGET A mov b, 1 jmp JOIN B C TARGET: mov b, 0 p 1 = (cond) (!p 1) mov b, 1 (p 1) mov b, 0 Convert control flow dependence to data dependence 10/31/2021 3
Benefit of Predicated Execution o Predicated Execution can be high performance and energy-efficient. Predicated Execution Fetch Decode Rename Schedule Register. Read Execute A E F A D B C C E D F C A B F E C D B A A B C D E F C A B D E F A B D C E F A F E C D B F D E B C A C D A B E B C A D A B C B Branch Prediction D B A A nop Fetch Decode Rename Schedule Register. Read Execute F E Pipeline flush!! F 10/31/2021 4 E D B A
Limitations/Problems of Predication o ISA: Predicate registers and predicated instructions n o o Dynamic-Hammock Predication[Klauser’ 98] can solve this problem but it is only applicable to simple hammocks. Adaptivity: Static predication is not adaptive to run-time branch behavior. n Branch behavior changes based on input set, phase, control-flow path. n Wish Branches[Kim’ 05] Complex CFG: A large subset of control-flow graphs is not converted to predicated code. n Function calls, loops, many instructions inside a region, and complex CFGs n Hyperblock[Mahlke’ 92] cannot adapt to frequently-executed paths dynamically. 10/31/2021 5
Outline o o o Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion 10/31/2021 6
Diverge-Merge Processor (DMP) o DMP can dynamically predicate complex branches (in addition to simple hammocks). o o The compiler identifies n Diverge branches n Control-flow merge (CFM) points The microarchitecture decides when and what to predicate dynamically. 10/31/2021 7
Dynamic Predication A Low-confidence T N C B A (mov R 1, 1) PR 10 = 1 B H A B C p 1 = (cond) branch p 1, TARGET mov R 1, 1 jmp JOIN TARGET: mov R 1, 0 H JOIN: add R 5, R 1, 1 (mov R 1, 0) C PR 11 = 0 select-µops (φ-nodes in SSA) PR 12 = (cond) ? PR 11 : PR 10 H Klauser et al. [PACT’ 98]: Dynamic-hammock predication 10/31/2021 8
Diverge-Merge Processor A C Diverge Branch B B D C E E F A G H Insert select-µops H CFM point Frequently executed path Not frequently executed path 10/31/2021 9
Diverge-Merge Processor A C A A A B D F E A G H Frequently executed path diverge-branch Not frequently executed path 10/31/2021 10 executed block CFM point
Control-Flow Graphs A A A . . . simple hammock nested hammock DMP Dynamic Hammock SW pred Wish br. Dual-path 10/31/2021 11 frequently-hammock loop non-merging
Dual-path Execution vs. DMP Dual-path A Low-confidence C B D E F path 1 path 2 DMP path 1 path 2 C B D D CFM E F D E F 10/31/2021 12
Control-Flow Graphs A A A . . . simple hammock nested hammock frequently-hammock DMP Dynamichammock SW pred sometimes Wish br. sometimes Dual-path 10/31/2021 13 loop non-merging
Distribution of Mispredicted Branches o 66% of mispredicted branches can be dynamically predicated in DMP. 10/31/2021 14
Distribution of Mispredicted Branches o 66% of mispredicted branches can be dynamically predicated in DMP. 10/31/2021 15
Outline o o o Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion 10/31/2021 16
Fetch Mechanism Low Confidence A C Diverge Branch B A B D Round-robin fetch E F C E G H H CFM point predicted path 10/31/2021 17
Dynamic Predication A B C E branch r 0, C add r 1 r 3, #1 add r 1 r 2, # -1 branch pr 10, C p 1 = pr 10 add pr 21 pr 13, #1 (p 1) add pr 31 pr 12, # -1(!p 1) select-µop pr 41 = p 1? pr 21 : pr 31 H add r 4 r 1, r 3 add pr 24 pr 41, pr 13 Arch. Phy. M R 1 PR 11 PR 41 PR 21 1 R 2 PR 12 R 3 PR 13 RAT 1 Arch. Phy. M R 1 PR 11 PR 31 1 R 2 PR 12 R 3 PR 13 RAT 2 Forks RAT, RAS, and GHR 10/31/2021 18
DMP Support o ISA Support n o Compiler Support [CGO’ 07] n o Mark diverge branches/CFM points. The compiler identifies diverge branches and the corresponding CFM points. Hardware Support n n n Confidence estimator Fetch mechanisms Load/store processing Instruction retirement Dynamic predication 10/31/2021 19
Hardware Complexity Analysis DMP Dyn. Dual ham. path Multi path SW Wish pred. br. Front-End Confidence Estimator Rename Support Predicate Registers Select-Uop Gen. ST-LD Forwarding Check Flush/no Flush 10/31/2021 20
Outline o o o Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion 10/31/2021 21
Simulation Methodology o 12 SPEC 2000 INT, 5 SPEC 95 INT n o o Alpha ISA execution driven simulator Baseline processor configuration n n o o Different input sets for profiling and evaluation 64 KB perceptron predictor/O-GEHL (paper) Minimum 30 -cycle branch misprediction penalty 8 -wide, 512 -entry instruction window 2 KB 12 -bit history enhanced JRS confidence estimator Less aggressive processor (paper) Power model using Wattch 10/31/2021 22
Different CFG types 10/31/2021 23
Performance Improvement 10/31/2021 24
Energy Consumption 10/31/2021 25
Outline o o o Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion 10/31/2021 26
Conclusion o o o DMP introduces the concept of frequently-hammocks and it dynamically predicates complex CFGs. DMP can overcome three major limitations of software predication: ISA support, adaptivity, complex CFG. DMP reduces branch mispredictions energy efficiently n o 19% performance improvement, 9% less energy DMP divides the work between the compiler and the microarchitecture: n n The compiler analyzes the control-flow graphs. The microarchitecture decides when and what to predicate dynamically. 10/31/2021 27
Thank You!!
Questions?
Handling Mispredictions Diverge Br. A C B Misprediction! D F E G H A A CFM point branch pr 10, C p 1 = pr 10 B B add pr 21 pr 13, #1 (p 1) (0) C E (1) add pr 31 pr 12, # -1(!p 1) add pr 44 pr 34, # -1(!p 1) (1) select-µop pr 41 = p 1? pr 21 : pr 31 D add pr 34 pr 31, pr 13 H add pr 24 pr 41, pr 13 D H predicted path 10/31/2021 30 Flush
Loop Branches o Exit Condition n o Benefit n o The loop branch is predicted to exit the loop. Reduced pipeline flushes: when the predicated loop is iterated more times than it should be. o Instructions in the extra iterations of the loop become NOPs. Instructions after loop-exit can still be executed. Negative Effects n n Increased execution delay of loop-carried dependencies The overhead of select-µops 10/31/2021 31
Loop Branches o A B Predicate each loop iteration separately A add r 1, #1 r 0 = (cond 1) branch A, r 0 A A A add r 1 r 1, #1 r 0 = (cond 1) branch A, r 0 branch A, pr 10 add pr 21 pr 11, #1 pr 20 = (cond 1) branch A, pr 20 p 1 = pr 10 (p 1) p 2 = pr 20 select-uop pr 22 = p 1 ? pr 21: pr 11 select-uop pr 23 = p 1? pr 20: pr 10 A add pr 31 pr 22, #1 pr 30 = (cond 1) branch A, pr 30 B add r 7 r 1, #10 (p 2) select-uop pr 32 = p 2 ? pr 31: pr 22 select-uop pr 33 = p 2 ? pr 30: pr 23 Loop br. is predicted to exit the loop B add pr 7 pr 32, #10 10/31/2021 32
Enhanced Mechanisms o o Multiple CFM points n The hardware chooses one CFM point for each instance of dynamic predication. Exit Optimizations n Counter Policy: What if one path does not reach the CFM point? o n Number of fetched instructions > Threshold Yield Policy: What if another low confidence diverge branch is encountered in dynamic predication mode? o Later low confidence branch is more likely mispredicted. 10/31/2021 33 A B G H C D E F
Detailed DMP Support o o 32 Predicate register ids Fetch mechanism n n High performance I-Cache Fetch two cache lines Predict 3 branches Fetch stops at the first taken branch 10/31/2021 34
Diverge and Merge? 10/31/2021 35
Useful Dynamic Predication Mode 10/31/2021 36
Perfect Branch Prediction 10/31/2021 37
Maximum Power 10/31/2021 38
Branch Predictor Effects 10/31/2021 39
Confidence Estimator Effects 10/31/2021 40
Results in Less Aggressive Processors 10/31/2021 41
DMP vs. Perfect Conditional BP 10/31/2021 42
Enhanced DMP Mechanisms 10/31/2021 43
DMP vs. Other Mechanisms 10/31/2021 44
Comparisons with Predication/Wish Branches non-predicated 10/31/2021 45
Reduction in Pipeline Flushes o Average overhead: n Dynamic-hammock: 4 instructions/entry n Dual-path: 150 instructions/entry n Multipath: 200 instructions/entry n DMP: 20 instructions/entry 10/31/2021 46
Handling Nested Diverge Branches Diverge Br. Basic DMP o A n C B D Enhanced DMP o F n E G H Ignore other low confidence div. branches CFM point 10/31/2021 47 Exit dynamic predication mode and re-enter from the younger low confidence branch on predicted path (Yield policy)
Compiler Support [CGO’ 07] o Compiler analyzes the control flow and the profile data n n n Step 1: Identify diverge branch candidates and CFM points. Step 2: Select diverge branches based on (1) the number of instructions between a branch and the CFM point (2) the probability of merging at the CFM point o Heuristics or a cost-benefit model Step 3: Mark the selected branches/CFM points. 10/31/2021 48
Future Research o Hardware Support n n Better confidence estimators Efficient hardware mechanism to detect diverge branches and CFM points o o Increase hardware complexity but eliminate the need for ISA/compiler support Compiler Support n Better compiler algorithms [CGO’ 07] 10/31/2021 49
Power Measurement Configurations o o 100 nm Technology Baseline processor n o Less aggressive processor n o o 4 GHZ 1. 5 GHz CC 3 clock-gating model in Wattch: unused units dissipate only 10% of their maximum power DMP: one more RAT/RAS/GHR, select-uop generation module, additional fields in BTB, predicate registers, CFM registers, loadstore forwarding, instruction retirement 10/31/2021 50
Fetched wrong-path instructions per entry into dynamic-predication/dual-path mode 10/31/2021 51
Fetched/Executed Instructions 10/31/2021 52
ISA Support o Example of Diverge Br and CFM markers OPCODE TARGET 00 : normal branch 10 : diverge forward branch 11 : diverge loop branch CFM = CFM rel address + PC 10/31/2021 53 CFM rel address
Entering Dynamic Predication Mode o Entry condition n o The Front-end n n n o When a diverge branch has low confidence. Stores the address of the CFM point to the CFM register. Forks the RAS, GHR, and RAT. Allocates a predicate register. Fetch Mechanisms n n Round-robin fetch from two paths The processor follows the branch predictor until it reaches the corresponding CFM point. 10/31/2021 54
Exiting Dynamic Predication Mode o Exit condition n n o Both paths of a diverge branch have reached the corresponding CFM point. A diverge branch is resolved. Select-µop mechanism n n Similar to φ-node in SSA Merges register values from two paths. 10/31/2021 55
Multipath Execution A Low-confidence path 2 C path 3 B path 4 D E F G H H H I I I C D path 1 B E F Low-confidence G Instructions after the control-flow merge point are fetched multiple times. Waste of resources and energy. 10/31/2021 56
Modeling Software Predication o o Mark using a binary instrumentation tool All simple and nested hammocks can be predicated. All instruction between a branch and the control-flow merge point are fetched. All nested branches are predicated. 10/31/2021 57
- "mass flow meter"
- Xxcmd
- Neurolgia
- Le dmp
- Forrester dmp
- Dr antonella ventura
- João leite ortiz
- Joao bilhim
- Poema luzinha do poeta kalunga
- Seja ignorante mas nao seja burro
- João augusto soares brandão
- Joao caio
- Porque joão chorou ao ver o livro selado
- Maria joo
- João 10, 22-30
- João 17v17
- João batista amigo do noivo
- Lição do abacaxi
- Eu estava num banco da pracinha observando um menino
- Museu d. joão vi
- Sulcou
- Quais são as igrejas pós-tribulacionistas
- João candido
- 1 joão 1 ara
- Dom joao vi
- Bozel são joão del rei
- Proezas de joão grilo
- João figueiredo tribunal de contas europeu
- Frases de scalabrini
- Madeireira fachi
- Cebrac são joão de meriti
- Capítulo 148 crónica de d joão síntese
- Porm
- Dom joao vi
- Joao e andre empurram caixas identicas
- Joao 14 1 a 3
- Klinisch redeneren dobber
- Kas yra bendruomenė ir kodėl mums jos reikia
- La jos
- Causas de deshidratación
- Antun branko šimić utjeha kose
- Nandram
- Jos hoffmann
- Jostabes confituur
- Subiecte exprimate și neexprimate
- Organizarea datelor in tabele si reprezentarea lor grafica
- Jos 1:9
- "k . p . k"
- Confidental costa rica
- Afstand drafbalken
- Dunarea de jos university of galati
- Biljke koje rastu uz more
- John van berkum
- Jos van den enden
- Jos r
- Josua 1 9 sei mutig und stark
- Shell process safety fundamentals
- Valley foundation school of nursing