Register DataFlow Framework A brief introduction Krzysztof Parzyszek
Register Data-Flow Framework A brief introduction Krzysztof Parzyszek, Qualcomm Innovation Center, Inc.
What is RDF? ● RDF is a framework that hides the complexity of data-flow analysis between registers after register allocation. ● The goal is to enable implementation of arbitrarily detailed dataflow optimizations: the precision of information is key. ● Central concept is a data-flow graph (DFG) that abstracts the data flow in a form closely resembling SSA. ● Implemented in RDFGraph. cpp/. h. ● Includes utilities to recalculate liveness: block live-ins and “kill” flags. ● Implemented in RDFLiveness. cpp.
Structure of DFG ● The graph represents an entire function: Nodes: basic blocks, statements, as well as register uses and defs. ● Edges: membership, data-flow. ● ● Nodes: ● Container nodes (aka “code nodes”) reflect the function structure: • function contains basic blocks, • basic block contains instructions (phi nodes and statements), • instruction contains register defs and uses. ● Reference nodes represent the defs and uses of registers. ● Edges: ● Structural, e. g. “first member”, “next member”. • There are helper functions to assist in member traversal. ● Data-flow, e. g. “reaching def”, “first reached use”. ● See RDFGraph. h and RDFGraph. cpp for details.
Structure of DFG: example Highlighted: def-to-use and sibling links for R 0 Before Hexagon RDF optimizations # Machine code for function foo: [. . . ] Function Live Ins: %R 0, %R 1, %R 2 DFG dump: [ f 1: Function: foo BB#0: derived from LLVM BB %entry Live Ins: %R 0 %R 1 %R 2 b 2: --- BB#0 --- preds(0): succs(2): BB#1, BB#2 p 25: phi [+d 26<R 0>(, d 14, u 34): ] p 27: phi [+d 28<R 1>(, , u 20): ] p 29: phi [+d 30<R 2>(, , u 12): ] s 3: C 2_cmpgti [d 4<P 0>(, , u 8): , u 5<R 0>(+d 26): ] s 6: J 2_jumpf BB#2 [/+d 7<PC>!(, d 22, ): , u 8<P 0>(d 4): ] %P 0<def> = C 2_cmpgti %R 0, 0 J 2_jumpf %P 0, <BB#2>, %PC<imp-def> Successors according to CFG: BB#1 BB#2 BB#1: derived from LLVM BB %if. then Live Ins: %R 0 %R 1 %R 2 Predecessors according to CFG: BB#0 S 4_storeiri_io %R 2, 0, 0; mem: ST 4[%p] %R 0<def> = A 2_addi %R 0, 1 Successors according to CFG: BB#2: derived from LLVM BB %if. end Live Ins: %R 0 %R 1 Predecessors according to CFG: BB#1 BB#0 %R 0<def> = A 2_add %R 0, %R 1 PS_jmpret %R 31, %PC<imp-def>, %R 0<imp. . > # End machine code for function foo. b 10: --- BB#1 --- preds(1): BB#0 succs(1): BB#2 s 11: S 4_storeiri_io [u 12<R 2>(+d 30): ] s 13: A 2_addi [d 14<R 0>(+d 26, , u 33): , u 15<R 0>(+d 26): u 5] b 16: --- BB#2 --- preds(2): BB#1, BB#0 succs(0): p 31: phi [+d 32<R 0>(, d 18, u 19): , u 33<R 0>(d 14, b 10): , u 34<R 0>(+d 26, b 2): u 15] s 17: A 2_add [d 18<R 0>(+d 32, , u 24): , u 19<R 0>(+d 32): , u 20<R 1>(+d 28): ] s 21: PS_jmpret [d 22<PC>!(/+d 7, , ): , u 23<R 31>!(): , u 24<R 0>!(d 18): ] ]
Liveness calculation ● Post-RA optimizations can invalidate liveness information. ● Liveness class can recalculate it based on the DFG: recompute block live-ins (with precise lane masks), and ● recompute “kill” flags. ● ● Other useful routines, such as get. All. Reaching. Defs. ● See RDFLiveness. cpp and RDFLiveness. h for more information.
Current uses ● Two generic optimizations: copy propagation and dead code elimination. Both have target-specific specializations (via callbacks). CP allows non-”COPY” instructions which can still transfer a register value. ● DCE can delete dead references, e. g. auto-update in postincrement instructions. ● See RDFCopy. cpp and RDFDead. Code. cpp. ● ● Addressing mode optimization: composing a single load/store with a complex addressing mode out of elementary instructions scattered over multiple blocks. ● Implemented in Hexagon. Opt. Addr. Mode. cpp.
Trivial copy propagation An illustration of RDF use in a simple copy propagation: for (Node. Addr<Block. Node*> BA : DFG. get. Func(). Addr->members(DFG)) { for (Node. Addr<Stmt. Node*> SA : BA. Addr->members_if(DFG, Is. Stmt)) { for (Node. Addr<Use. Node*> UA : SA. Addr->members_if(DFG, Is. Use)) { auto D = DFG. addr<Def. Node*>(UA. Addr->get. Reaching. Def()); Node. Addr<Instr. Node*> I = D. Addr->get. Owner(DFG); if (Is. Phi(I)) continue; Machine. Instr *MI = Node. Addr<Stmt. Node*>(I)->get. Code(); if (!MI->is. Copy()) continue; Node. Addr<Use. Node*> U = // first use in I if (U->get. Reaching. Def() == /*reaching def of reg at SA*/) UA. Addr->get. Operand()->set. Reg(U. Addr->get. Operand()->get. Reg()); } } }
Future work ● Ensuring that it works correctly on all targets. From the functional perspective the code supports all required features, but the multi-target testing has been very limited so far. ● Developing more optimizations using it. So far most of the effort was on making the framework flexible and robust.
Thank you Follow us on: For more information, visit us at: www. qualcomm. com & www. qualcomm. com/blog Nothing in these materials is an offer to sell any of the components or devices referenced herein. © 2017 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Hexagon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc. , and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc. , a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.
- Slides: 9