Using Information About Cache Evictions to Measure the

Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth University of Maryland Department of Computer Science University of Maryland 1

Introduction l Cache behavior information is important – Processor speed increasing faster than memory l Should relate cache info to data structures – More useful to programmer in tuning applications l Collect using hardware – Software techniques, such as simulation, are slow – In the past, limited hardware support – Situation is changing, hardware support more common University of Maryland 2

Outline l Measuring cache misses – Sampling l Information about evictions – What is required – Sampling l Simulation-based study – The simulator and applications used – Results l Conclusions and future work University of Maryland 3

Finding Objects With Most Cache Misses l Handling every cache miss is slow – Use sampling, requirements: • Periodic interrupt on cache miss • Ability to determine miss address l Associate count with each object – Variable or dynamically allocated memory l Interrupt after every n cache misses – Obtain address of miss – Find object containing it and increment count University of Maryland 4

Interactions Between Objects l Why does data leave the cache? – What object caused it to be replaced? l Hardware could provide eviction information – When miss occurs, save address of evicted data l Not difficult to provide physical address – Can calculate from tag of evicted cache line – Information in OS can map physical to virtual • May be imprecise due to paging University of Maryland 5

Measuring Eviction Information l Use sampling, store more at each miss – Object that caused the miss – Object containing the data that was evicted – Part of code it happened in l Questions – “Buckets” much smaller, will sampling be accurate? – Data structure more complicated, how efficient? University of Maryland 6

Experiments l Implemented in simulation – Simulator uses ATOM binary rewriting tool • Instrument load/stores for cache simulation • Instrument basic blocks for virtual cycle count • Simulates necessary hardware support – Miss and eviction sampling runs under simulation l Tested using SPEC 95/2000 applications – su 2 cor, applu, equake, gzip, mgrid, swim, wupwise, … – Sampled 1 in 25, 000 misses University of Maryland 7

Accuracy of Sampling Cache Misses Application su 2 cor swim Variable Actual Rank % Sample Rank % U 1 60. 5 1 61. 1 R-loops 2 5. 3 4 4. 6 S 3 5. 0 2 5. 3 W 2 -intact 4 4. 2 3 5. 3 W 2 -sweep 5 4. 1 6 4. 0 UNEW 1 10. 3 1 10. 6 PNEW 2 10. 3 3 9. 8 VNEW 3 10. 3 2 10. 0 CU 4 7. 0 6 7. 1 H 5 6. 9 9 6. 9 University of Maryland 8

Eviction Results: mgrid University of Maryland 9

Evictions By Code Region: mgrid % of total evictions of U by U, V, and R in each line of code. Variable U V R Function Line Actual Rank Sample % Rank % resid 214 1 40. 7 1 42. 1 interp 302 2 5. 2 4 4. 7 interp 312 3 5. 0 interp 290 4 4. 7 5 4. 7 interp 281 5 4. 7 2 5. 0 resid 200 1 1. 6 1 2. 1 psinv 174 1 18. 8 1 18. 6 resid 200 2 2. 3 2 2. 1 University of Maryland 10

Cache Misses Due to Instrumentation University of Maryland 11

Instrumentation Overhead University of Maryland 12

Simulation Overhead University of Maryland 13

Using Dyninst l Better knowledge about objects – Local variables – FORTRAN common blocks l Can instrument memory allocation routines – Track objects created/destroyed l Measure by code using hardware counters – Save counts at significant points, like Paradyn • Function entries/exits/calls – Turn counting on & off around areas of interest University of Maryland 14

Instrumenting Loads and Stores l New BPatch_point type – BPatch_load. Store – New method, is. Store(), returns true or false l New expression type – BPatch_effective. Addr • Only valid at BPatch_load. Store points • Returns the effective address being accessed University of Maryland 15

Future Work l Run miss sampling on real hardware – IBM POWER 3, POWER 4 – Use Dyninst l Visualization tool – Save all data in compact format tool understands • For tested applications, largest file is 15 MB – Filter by objects, parts of code – Compare data from different runs l Use results to optimize applications University of Maryland 16

Future Work Continued l More uses of eviction information – For estimating portion of object in cache • Use difference of misses and evictions – For finding lost opportunities for reuse • Track evicted data to until next load • Measure interval in time, cache misses, etc. University of Maryland 17

Conclusions l Features are appearing in new processors – Possible to implement cache miss sampling now – Much more efficient than software simulation l Eviction information in hardware practical – Sampling is efficient and accurate l Could use Dyninst – For simulation or for hardware University of Maryland 18
- Slides: 18