PACT 02 presentation A Framework for Parallelizing LoadStores

  • Slides: 22
Download presentation
PACT’ 02 presentation A Framework for Parallelizing Load/Stores on Embedded Processors Xiaotong Zhuang Santosh

PACT’ 02 presentation A Framework for Parallelizing Load/Stores on Embedded Processors Xiaotong Zhuang Santosh Pande John S. Greenland Jr. College of Computing, Georgia Tech 1

PACT’ 02 presentation Background and Motivation z Speed gap between memory and CPU remains

PACT’ 02 presentation Background and Motivation z Speed gap between memory and CPU remains z Multi-bank memory architecture: Motorola DSP 56000 series, NEC 77016, SONY p. DSP, Analog Devices ADSP-210 x, Starcore SC 140 processor core z Parallel instructions allow parallel access to memory banks: PLDXY r 1, @a, r 2, @b, loads @a r 1 and @b r 2 at the same time. z Objective: y Try to maximally generate parallel Load/Store (such as PLDXY) instructions through compiler optimizations. y Controlled code & data segment growth y Reasonable speed of compilation 2 2

PACT’ 02 presentation General approaches z Model as ILP problem--Rainer Leupers, Daniel Kotte, “Variable

PACT’ 02 presentation General approaches z Model as ILP problem--Rainer Leupers, Daniel Kotte, “Variable partitioning for dual memory bank DSPs”, ICASSP, May’ 01 y Variables Ni with value 0/1 for each LD/ST instr. to represent its memory bank assignment (X or Y) y Variables Eij with value 0/1 to represent whether two instructions can be merged y Enforcing other constraints and max the selected edge weight z Model as Graph problem--A. Sudarsanam, S. Malik, “Simultaneous Reference Allocation in Code Generation for Dual Data Memory Bank ASIPs”, TODAES, Apr’ 00 y Each Load/Store as a node y Edge between nodes represents they can be merged y Pick maximal number of edges that are disjoint 3 3

PACT’ 02 presentation Major contributions z Keep the model simple and easy to be

PACT’ 02 presentation Major contributions z Keep the model simple and easy to be solved mathematically z Identify the movable boundary problem, which impedes the problem modeling and simplification z Propose Motion Schedule Graph (MSG) and two approaches to solve it heuristically z Merge with instruction duplication and variable duplication z Cross basic block merges z Other improvements like local conflict elimination through rematerialization and some global optimization issues z An iterative approach, which systematically grows 4 4

PACT’ 02 presentation Basic concepts (1) z Post-pass approach: assuming a good register allocator

PACT’ 02 presentation Basic concepts (1) z Post-pass approach: assuming a good register allocator has been used--Appel & George’s register allocation algorithm z Alias analysis y Memory access instruction dis-ambiguity y Most alias can be uniquely determined in our benchmark program z Memory access instructions y ST[addr], r is the definition of a memory address y LD[addr], r is the use of a memory address y For base-offset Load/Store instructions, normally for arrays, assume arrays are inseparable and more register conflicts will be considered. z Dependencies. Alias analysis y Address conflicts 5 5

PACT’ 02 presentation Basic concepts (2) z Building Webs y Webs: maximal union of

PACT’ 02 presentation Basic concepts (2) z Building Webs y Webs: maximal union of du-chains. All variable def/use on the web MUST be allocate to the same memory location y One variable appears in separate web can be put into different memory locations y Achieve value separation z Motion range determination y Defined as interval between program points where a Load/Store can be legally moved, restrained by dependencies y Load/Store instructions with overlapping range MAY be merged y Notice for Movable Boundary problem 6 6

PACT’ 02 presentation Movable boundary problem z The motion boundary of one Load/Store instruction

PACT’ 02 presentation Movable boundary problem z The motion boundary of one Load/Store instruction is also a Load/Store instruction z Assuming fixed boundary will cause incorrect merge 7 7

PACT’ 02 presentation Motion schedule graph z Pseudo fixed-boundary y For Store: move as

PACT’ 02 presentation Motion schedule graph z Pseudo fixed-boundary y For Store: move as early as possible assuming other instructions are fixed y For Load: move as late as possible assuming other instructions are fixed z Motion Schedule Graph y Nodes represent individual Load/Store instructions y Oval encloses Load/Store on the same web y Edges link nodes that have overlapped motion range (with 8 respect to pseudo fixed- 8

PACT’ 02 presentation Conflict resolution 9 9

PACT’ 02 presentation Conflict resolution 9 9

PACT’ 02 presentation Example 10 10

PACT’ 02 presentation Example 10 10

PACT’ 02 presentation Graph solving z The whole problem is provably NP-complete—refer to Appendix

PACT’ 02 presentation Graph solving z The whole problem is provably NP-complete—refer to Appendix A z Two separate problems: Bank Assignment and Edge Picking z For predetermined bank assignments, the Edge Picking problem can be optimally solved in polynomial time z Heuristic algorithms y Brutal force searching will take O(|V|32 n) time. Doable for small programs y SA can approach the optimal solution but will greatly increase the compilation time y Use heuristic to solve bank assignment, then get optimal solution for Edge Picking 11 11

PACT’ 02 presentation Edge Picking as max flow problem 12 12

PACT’ 02 presentation Edge Picking as max flow problem 12 12

PACT’ 02 presentation Bank assignment heuristic 13 13

PACT’ 02 presentation Bank assignment heuristic 13 13

PACT’ 02 presentation Post-pass phases 14 14

PACT’ 02 presentation Post-pass phases 14 14

PACT’ 02 presentation Cross BB merge (Instr. duplication) z Move to predecessor/successor to create

PACT’ 02 presentation Cross BB merge (Instr. duplication) z Move to predecessor/successor to create new opportunities z To guarantee profitability y Move to where the reference is live y Move ST on EBB y Move LD on reverse EBB y Make sure: can be combined if pushed to at least one of the live predecessors/successors 15 15

PACT’ 02 presentation Variable duplication 16 16

PACT’ 02 presentation Variable duplication 16 16

PACT’ 02 presentation Local conflict elimination z Motivation y Register allocator may assign same

PACT’ 02 presentation Local conflict elimination z Motivation y Register allocator may assign same register to neighboring ranges, which leads to register conflicts y ISA restrictions may need particular registers but not available at the program point z Rematerialization to free a register and reconstruct it after the merge to make the register available. 17 17

PACT’ 02 presentation Merge type and MSG properties 18 18

PACT’ 02 presentation Merge type and MSG properties 18 18

PACT’ 02 presentation Compilation time 19 19

PACT’ 02 presentation Compilation time 19 19

PACT’ 02 presentation Runtime performance 20 20

PACT’ 02 presentation Runtime performance 20 20

PACT’ 02 presentation Code size comparison 21 21

PACT’ 02 presentation Code size comparison 21 21

PACT’ 02 presentation Conclusion z A framework to analyze and merge LD/STs. z Our

PACT’ 02 presentation Conclusion z A framework to analyze and merge LD/STs. z Our heuristic approach comes close to exhaustive search with less compilation time. z Enhancing the range of motion of the instructions by undertaking variable and instruction replications, so the generated code quality is superior to the exhaustive methods previously proposed. 22 22