Improving Instruction Locality with JustInTime Code Layout J

  • Slides: 23
Download presentation
Improving Instruction Locality with Just-In-Time Code Layout J. Bradley Chen and Bradley D. D.

Improving Instruction Locality with Just-In-Time Code Layout J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University 2/13/2022 1

Goals • Improve instruction reference locality – big problem for commodity applications • Eliminate

Goals • Improve instruction reference locality – big problem for commodity applications • Eliminate need for profile information – required by current compiler-based solutions 2/13/2022 2

How? Implement layout dynamically using Activation Order: • A new heuristic for code layout.

How? Implement layout dynamically using Activation Order: • A new heuristic for code layout. • Locate procedures in order of use. 2/13/2022 3

Requirements • No special hardware support. • Minimal changes to the operating system. •

Requirements • No special hardware support. • Minimal changes to the operating system. • Minimal system overhead. 2/13/2022 4

Optimizing Procedure Layout Bad Layout 2/13/2022 Better Layout 5

Optimizing Procedure Layout Bad Layout 2/13/2022 Better Layout 5

Current Practice: Pettis and Hansen • Nodes are procedures. Win. Main() 1 • Edges

Current Practice: Pettis and Hansen • Nodes are procedures. Win. Main() 1 • Edges are caller/callee pairs. 1 Initialize() • Weights are call frequency. Event. Loop() 129394 Get. Event() 68754 React() 128404 1 Check. For. Input. Error() 68753 Handle. Rare. Case() 10 Handle. Common. Case() Handle. Input. Error() 2/13/2022 6

Pettis and Hansen Layout layout: [] layout: [Get. Event, Check. For. Input. Errors] layout:

Pettis and Hansen Layout layout: [] layout: [Get. Event, Check. For. Input. Errors] layout: [Event. Loop, Get. Event, Check. For. Input. Errors] Event. Loop() 129394 68754 Get. Event() Event. Loop() 129394 React() Node-1 128404 Node-2 68754 React() Check. For. Input. Error() 68753 Handle. Common. Case() layout: [React, Event. Loop, Get. Event, Check. For. Input. Errors] 68753 Handle. Common. Case() layout: [Handle. Common. Case, React, Event. Loop, Get. Event, Check. For. Input. Errors] Node-3 68753 Node-4 Handle. Common. Case() 2/13/2022 7

A New Heuristic Activation Order: Co-locate procedures that are activated sequentially. Example: 2/13/2022 8

A New Heuristic Activation Order: Co-locate procedures that are activated sequentially. Example: 2/13/2022 8

Implementing JITCL __start: perform initializations call thunk_main: . . . thunk_foo: . . .

Implementing JITCL __start: perform initializations call thunk_main: . . . thunk_foo: . . . __Instruction. Memory: Thunk routines implement code layout on-the-fly. 2/13/2022 9

Thunk routines // Global variables: // Proc. Pointers[] - one element per procedure //

Thunk routines // Global variables: // Proc. Pointers[] - one element per procedure // INDEX_proc and LENGTH_proc for each procedure thunk_main: if (In. Code. Segment(Proc. Pointers[INDEX_main])) Proc. Pointers[INDEX_main] = Copy. To. Text. Segment(Proc. Pointer[INDEX_main], LENGTH_main); Patch. Call. Site(Proc. Pointer[INDEX_main], Compute. Call. Site. From. Return. Address(RA)); jmp Proc. Pointer[INDEX_main]; The thunk routines copy procedures into the text segment and update call sites at run-time. 2/13/2022 10

Simulation Methodology Cache Size Associativity Simulation 2/13/2022 UNIX/RISC 8 K Direct-Mapped ATOM Win 32/x

Simulation Methodology Cache Size Associativity Simulation 2/13/2022 UNIX/RISC 8 K Direct-Mapped ATOM Win 32/x 86 8 K 2 -Way Etch 11

Workloads 2/13/2022 12

Workloads 2/13/2022 12

Results • The AO heuristic is effective. • The overhead of JITCL is negligible.

Results • The AO heuristic is effective. • The overhead of JITCL is negligible. • JITCL improves procedure layout without requiring profile information. • JITCL reduces program memory requirements. 2/13/2022 13

Results: The AO Heuristic Improvement in I-Cache Miss Rate Conclusion: Effectiveness of heuristic is

Results: The AO Heuristic Improvement in I-Cache Miss Rate Conclusion: Effectiveness of heuristic is comparable to P&H. 2/13/2022 14

Overhead of JITCL • Copy overhead – instruction overhead – cache overhead • Cache

Overhead of JITCL • Copy overhead – instruction overhead – cache overhead • Cache consistency • Disk overhead - comparable to demand loaded text; not evaluated. 2/13/2022 15

Results: Overhead Instructions (%) Conclusion: JITCL Overhead is less than 0. 1% in all

Results: Overhead Instructions (%) Conclusion: JITCL Overhead is less than 0. 1% in all cases. 2/13/2022 16

Results: Performance Saved Cycles per Instruction Conclusion: Overall performance is comparable to P&H. 2/13/2022

Results: Performance Saved Cycles per Instruction Conclusion: Overall performance is comparable to P&H. 2/13/2022 17

JITCL for Win 32 Applications • Windows applications are composed of multiple executable modules.

JITCL for Win 32 Applications • Windows applications are composed of multiple executable modules. • When transitions between modules are frequent, intra-module code layout is less effective. • With JITCL, inter-module code layout is possible and beneficial. 2/13/2022 18

Win 32 Cache Miss Rates Conclusion: Careful layout did not help Win 32 applications.

Win 32 Cache Miss Rates Conclusion: Careful layout did not help Win 32 applications. 2/13/2022 19

Text Segment Size Text size in megabytes Conclusion: JITCL typically reduces text size by

Text Segment Size Text size in megabytes Conclusion: JITCL typically reduces text size by 50%. 2/13/2022 20

JITCL vs. PBO • JITCL provides an alternative to feedback-based procedure layout. • Many

JITCL vs. PBO • JITCL provides an alternative to feedback-based procedure layout. • Many important optimizations still require profile information. – instruction scheduling – register allocation – other intra-procedural optimizations • Don’t expect profile-based optimization to go away! 2/13/2022 21

Conclusions Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need

Conclusions Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need for profiles. • The AO heuristic is effective. • The overhead of procedure copying is low. • Benefit in I-Cache is comparable to Pettis and Hansen layout. • JITCL can reduce working set size. 2/13/2022 22

The Morph Project Morph For more information: http: //www. eecs. harvard. edu/morph/ 2/13/2022 23

The Morph Project Morph For more information: http: //www. eecs. harvard. edu/morph/ 2/13/2022 23