ControlFlow Decoupling Rami Sheikh James Tuck Eric Rotenberg
Control-Flow Decoupling Rami Sheikh, James Tuck, Eric Rotenberg North Carolina State University Motivation CFD Ø Single-thread performance is important for singleand multi-threaded applications. Ø Per-core energy consumption is at a premium. Ø Better branch handling is a BIG win: improves performance, reduces energy and enables memory latency 3 tolerance. 63% 68% 67% 65% 67% 69% 2 128 Haswell 96 Sandy Bridge Nehalem 0 Future Generations 168 192 256 Window Size 384 512 Interesting Observation A third of mispredictions come from separable branches: Ø The branch has a large CD region (if-conversion not profitable). Ø The branch does not depend on its own CD instructions via a loopcarried data dependence. slic e branc h branch BQ size is finite + loops with high trip counts = loop strip-mining BQ microarch. , length and recovery Execution Scenarios IF ………. . . Interaction with pipelining and Oo. O execution Uncommon Case BQ miss slic e branc h EX IF IF …………. … IF CLONE LOOP Speculate or Stall • Branch slice • Controldependent region INSERT • Connect loop exits to the clone’s pre-header (provide inorder fetch) CLEAN-UP • PUSH in loop: after branch slice • POP in clone: to replace the branch controldependent region Results Applying CFD manually: Control-Flow Decoupling (CFD) Key idea: separate the loop into two loops: Ø The first contains only the branch’s predicate computation. Ø The second contains the branch and its control. CFD Loops dependent instructions. Original Loop branch-slice Push_BQ Applying CFD automatically (compiler): 1. 6 1. 4 BQ 0. 8 0. 6 0. 4 0. 2 controldependent region 0. 9 Normalized Energy 1. 0 Manual 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 0 jp om eg -c ec la t 0. 0 Conclusion Original slic e branc h independent work in between IF …. EX IF …. . … EX ……. …. slic e IF branc h www. Poster. Presentations. com TEMPLATE DESIGN © 2008 CFD Problem #1 No fetch separation: need branch prediction Problem #2 No mechanism to comm. predicates to Fetch Unit IF …. . … EX slic e branc h IF CFD provides: • Fetch separation • Mechanism to comm. predicates to Fetch Unit …. EX IF …. EX BQ IF …. EX IF …. EX Automated 0. 1 pr so m pl c ex f so (pds pl ex ) (re tif f) f-2 -b w controldependent region Automated 1. 2 Speedup Branch_on_BQ BQ drives fetch Manual 1. 0 jp ecl at eg -c om pr so m c pl ex f (p so ds pl ) ex (re tif f) f-2 -b w branch-slice branch EX Other interesting aspects of CFD: Ø Supports partially separable branches Ø Supports nested branches through multi-level decoupling Ø Overheads can be significantly reduced through value communication (called CFD+ in the paper) CFD Compiler Implementation in GCC IDENTIFY branch-slice Hardware Side BQ hit Energy Reduction 1 0. 5 New push/pop instructions baseline + perfect prediction 2. 5 1. 5 BQ specification Software Side Common Case Conroe Instructions per Cycle (IPC) baseline ISA Support Ø A third of mispredictions come from separable branches. Ø CFD is a software/hardware collabor-ation for exploiting separability with low complexity and high efficacy. Ø CFD is comparable to ifconversion in terms of number • Dead and redundant code elimination
- Slides: 1