Improving Instruction Locality with JustInTime Code Layout J






![Pettis and Hansen Layout layout: [] layout: [Get. Event, Check. For. Input. Errors] layout: Pettis and Hansen Layout layout: [] layout: [Get. Event, Check. For. Input. Errors] layout:](https://slidetodoc.com/presentation_image_h2/b310e96ba85f4a767c2255599e2b8b44/image-7.jpg)


![Thunk routines // Global variables: // Proc. Pointers[] - one element per procedure // Thunk routines // Global variables: // Proc. Pointers[] - one element per procedure //](https://slidetodoc.com/presentation_image_h2/b310e96ba85f4a767c2255599e2b8b44/image-10.jpg)













- Slides: 23
Improving Instruction Locality with Just-In-Time Code Layout J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University 2/13/2022 1
Goals • Improve instruction reference locality – big problem for commodity applications • Eliminate need for profile information – required by current compiler-based solutions 2/13/2022 2
How? Implement layout dynamically using Activation Order: • A new heuristic for code layout. • Locate procedures in order of use. 2/13/2022 3
Requirements • No special hardware support. • Minimal changes to the operating system. • Minimal system overhead. 2/13/2022 4
Optimizing Procedure Layout Bad Layout 2/13/2022 Better Layout 5
Current Practice: Pettis and Hansen • Nodes are procedures. Win. Main() 1 • Edges are caller/callee pairs. 1 Initialize() • Weights are call frequency. Event. Loop() 129394 Get. Event() 68754 React() 128404 1 Check. For. Input. Error() 68753 Handle. Rare. Case() 10 Handle. Common. Case() Handle. Input. Error() 2/13/2022 6
Pettis and Hansen Layout layout: [] layout: [Get. Event, Check. For. Input. Errors] layout: [Event. Loop, Get. Event, Check. For. Input. Errors] Event. Loop() 129394 68754 Get. Event() Event. Loop() 129394 React() Node-1 128404 Node-2 68754 React() Check. For. Input. Error() 68753 Handle. Common. Case() layout: [React, Event. Loop, Get. Event, Check. For. Input. Errors] 68753 Handle. Common. Case() layout: [Handle. Common. Case, React, Event. Loop, Get. Event, Check. For. Input. Errors] Node-3 68753 Node-4 Handle. Common. Case() 2/13/2022 7
A New Heuristic Activation Order: Co-locate procedures that are activated sequentially. Example: 2/13/2022 8
Implementing JITCL __start: perform initializations call thunk_main: . . . thunk_foo: . . . __Instruction. Memory: Thunk routines implement code layout on-the-fly. 2/13/2022 9
Thunk routines // Global variables: // Proc. Pointers[] - one element per procedure // INDEX_proc and LENGTH_proc for each procedure thunk_main: if (In. Code. Segment(Proc. Pointers[INDEX_main])) Proc. Pointers[INDEX_main] = Copy. To. Text. Segment(Proc. Pointer[INDEX_main], LENGTH_main); Patch. Call. Site(Proc. Pointer[INDEX_main], Compute. Call. Site. From. Return. Address(RA)); jmp Proc. Pointer[INDEX_main]; The thunk routines copy procedures into the text segment and update call sites at run-time. 2/13/2022 10
Simulation Methodology Cache Size Associativity Simulation 2/13/2022 UNIX/RISC 8 K Direct-Mapped ATOM Win 32/x 86 8 K 2 -Way Etch 11
Workloads 2/13/2022 12
Results • The AO heuristic is effective. • The overhead of JITCL is negligible. • JITCL improves procedure layout without requiring profile information. • JITCL reduces program memory requirements. 2/13/2022 13
Results: The AO Heuristic Improvement in I-Cache Miss Rate Conclusion: Effectiveness of heuristic is comparable to P&H. 2/13/2022 14
Overhead of JITCL • Copy overhead – instruction overhead – cache overhead • Cache consistency • Disk overhead - comparable to demand loaded text; not evaluated. 2/13/2022 15
Results: Overhead Instructions (%) Conclusion: JITCL Overhead is less than 0. 1% in all cases. 2/13/2022 16
Results: Performance Saved Cycles per Instruction Conclusion: Overall performance is comparable to P&H. 2/13/2022 17
JITCL for Win 32 Applications • Windows applications are composed of multiple executable modules. • When transitions between modules are frequent, intra-module code layout is less effective. • With JITCL, inter-module code layout is possible and beneficial. 2/13/2022 18
Win 32 Cache Miss Rates Conclusion: Careful layout did not help Win 32 applications. 2/13/2022 19
Text Segment Size Text size in megabytes Conclusion: JITCL typically reduces text size by 50%. 2/13/2022 20
JITCL vs. PBO • JITCL provides an alternative to feedback-based procedure layout. • Many important optimizations still require profile information. – instruction scheduling – register allocation – other intra-procedural optimizations • Don’t expect profile-based optimization to go away! 2/13/2022 21
Conclusions Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need for profiles. • The AO heuristic is effective. • The overhead of procedure copying is low. • Benefit in I-Cache is comparable to Pettis and Hansen layout. • JITCL can reduce working set size. 2/13/2022 22
The Morph Project Morph For more information: http: //www. eecs. harvard. edu/morph/ 2/13/2022 23