Structure Layout Optimizations in the Open 64 Compiler
- Slides: 19
Structure Layout Optimizations in the Open 64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow Path. Scale, LLC.
Outline Ø Motivation Ø Types of structure layout optimizations Ø Criteria for structure layout optimizations Ø Implementation details Ø Performance results Ø Future work Ø Conclusion Open 64 Workshop 2008 2
Motivation Ø Poor data locality in many applications Ø High data cache miss rates Ø Growing gap between processor and memory speeds Our Aim Ø Make applications more cache-friendly Our Approach Ø Change layout of data structures Ø Requires whole-program optimization Ø Use Inter-Procedural Analysis and Optimizations (IPA) Open 64 Workshop 2008 3
IPA Summarizatio n Ø Ø Analysis Ø Optimization Open 64 Workshop 2008 4
Types of Structure Layout Optimizations Structure splitting struct_A { double d 1; double d 2; int i; float f; long l; char c; struct_A * next; }; Open 64 Workshop 2008 Structure peeling struct_A { double d 1; double d 2; int i; float f; long l; char c; }; 5
Structure Splitting Example struct_A { double d 1; double d 2; int i; float f; long l; char c; struct_A * next; }; Open 64 Workshop 2008 struct new_struct_A { double d 1; int i; long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct cold_sub_struct_A { double d 2; float f; char c; }; 6
Structure Peeling Example struct_A { double d 1; double d 2; int i; float f; long l; char c; }; Open 64 Workshop 2008 struct new_struct_A { double d 1; int i; long l; }; struct cold_sub_struct_A { double d 2; float f; char c; }; 7
Criteria for structure layout optimizations Ø Legality Analysis Ø Type cast Address of a field is taken Escaped types Parameter types Full visibility to IPA Alignment restrictions Open 64 Workshop 2008 Profitability Analysis Hotness Affinity 8 Field accesses at loop level Size
Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization) Open 64 Workshop 2008 9
Implementation Details: Type information summarization Ø Information summarization in IPL Ø Framework for computing static profiles using heuristics Ø New TY flag TY_NO_SPLIT Ø SUMMARY_TY_INFO Ø SUMMARY_LOOP Ø For each DO_LOOP, WHILE_DO, DO_WHILE Ø Bit-vector to track field accesses of up to N structure for each loop Ø Considers field accesses immediately inside loop ØThese fields are considered affine to each other Ø Execution count of statements immediately inside loop ØFrom statically estimated profiles or from runtime feedback Open 64 Workshop 2008 10
Implementation Details: IPA Analysis Ø Inter-procedurally update statically estimated execution count of PUs Ø Update statically estimated loop frequencies in SUMMARY_LOOP Ø Consider SUMMARY_LOOP from the hottest P PUs Ø Determine candidates for structure-layout transformation Ø Determine new layout of structures Open 64 Workshop 2008 11
Implementation Details: IPA Analysis Example F 4 F 3 L 1 F 2 22 L 2 F 1 BV 22 0101 14 L 3 0010 12 L 4 8 12 8 L 5 1100 6 F 4 AG 1 6 F 3 F 2 40 F 1 14 8 8 Li — Loops AGk — Affinity groups Fj — Fields in a struct Open 64 Workshop 2008 0101 40 AG 2 AG 3 0101 12
Implementation Details: Transforming the program Example: struct S { // N fields struct T * p; // M fields }; struct S { // N fields struct T 1 * p 1; struct T 2 * p 2; // M fields }; Open 64 Workshop 2008 struct T { // AG 1 fields // AG 2 fields }; // peel T struct T 1 { // AG 1 fields }; struct T 2 { // AG 2 fields }; 13 Ø New type definitions Ø Field table update Ø Field access statements Ø New symbols Ø Assignment statements
Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1); Ø Detect memory management routine calls involving transformed type T Ø Replicate call, assignment statements Ø Update size of memory being allocated Ø Handle comparisons involving pointer p Open 64 Workshop 2008 14
Performance Results Compilations options: -Ofast at 32 -bit ABI Speedup due to structure layout optimizations Benchmarks AMD Intel® Si. Cortex Geometric Opteron™ Barcelona(2. EM 64 T(3. 4 G Core™(3. 0 MIPS®(500 MHz, Mean (2. 8 GHz, 0 GHz, 8 GB, Hz, 4 GB, GHz, 4 GB, 256 KB) 4 GB, 1 MB) 512 KB) 1 MB) 4 MB) 179. art 134% 66% 56% 47% 41% 62. 5% 181. mcf 24% 23% 31% 13% 22. 0% 462. libquantum 32% 17% 40% 72% 62% 39. 6% Geometric Mean 46. 9% 29. 6% 37. 2% 47. 2% 32. 1% 37. 9% Open 64 Workshop 2008 15
Performance Results (continued) Compilations options: -Ofast at 64 -bit ABI Speedup due to structure layout optimizations Benchmarks AMD Intel® Si. Cortex Geometric Opteron™ Barcelona(2. EM 64 T(3. 4 G Core™(3. 0 MIPS®(500 MHz, Mean (2. 8 GHz, 0 GHz, 8 GB, Hz, 4 GB, GHz, 4 GB, 256 KB) 4 GB, 1 MB) 512 KB) 1 MB) 4 MB) 179. art 169% 66% 53% 60% 45% 69. 3% 181. mcf 25% 35% 12% 30% 7% 18. 6% 462. libquantum 82% 51% 75% 70% 69% 68. 6% Geometric Mean 70. 2% 49. 0% 36. 3% 50. 1% 27. 9% 44. 6% Open 64 Workshop 2008 16
Performance Results (continued) Compilations options: -Ofast at 64 -bit ABI Multiple copies of 462. libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2. 0 GHz, 8 GB, 512 KB, 2 MB) 3 rd level cache shared among 4 cores Speedup from structure layout optimizations Benchmark 1 copy 2 copies 462. libquantum 51% 69% 123% Open 64 Workshop 2008 17
Future Work Ø Tune static profile estimation Ø Less restrictions Ø Integrate with field-reordering Open 64 Workshop 2008 18
Conclusion Ø A framework for performing structure layout transformations is now available in the Open 64 compiler. Ø The superior infrastructure in the Open 64 compiler helped us implement the optimizations cleanly and with relatively less effort. Ø Substantial speedups are possible on some of the CPU 2000 and CPU 2006 SPEC benchmarks. Ø Structure layout optimization is a required feature for a compiler to remain competitive. Open 64 Workshop 2008 19
- Yacc tutorial
- Cross compiler in compiler design
- Open innovation open science open to the world
- The structure of a compiler
- Frontend and backend of compiler
- Numerator layout vs denominator layout
- Language
- Cddat
- Fluid layout vs fixed layout
- Open letter parts
- Business letter format mla
- Sales letter in business communication
- Open field fms layout
- Hát kết hợp bộ gõ cơ thể
- Ng-html
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Gấu đi như thế nào
- Tư thế worms-breton
- Chúa sống lại