Samplingbased Program Locality Approximation Yutao Zhong Wentao Chang

Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8 th, 2008 1

Outline • • 2 Background information Motivation Our sampling approach Experimental results

Reuse distance and reuse signature • Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element 2 Starting Point abcaacb Ending Point 2 • Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths 3

Reuse signature application • Relationship to cache behavior : • Capacity miss <= reuse distance ≥ cache size • Reduce reuse distance => improve cache effectiveness • Current applications : • Predict cache miss rate • • • 4 [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07] Reorganize data [Zhong+04] Provide caching hint [Beyls & D’Hollander 02] Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

Reuse distance measurement 1 ① Large space and a long counting time required to store traces c a baccess ba and countamemory Starting Point Data Structure: Address Get Accessed Memory Address Access Time Table Search Update Ending Point Access Trace Last Search, Count Update ② Enormous efforts for memory-intensive program 5 Distance Histogram Distance Record distance

Motivation • Sampling is generally effective to reduce the overhead of program behavior profiling • We are devoted to balance efficiency and accuracy • Sample only 1% memory accesses • Improve measurement speed by 7. 5 times in average • Achieve over 99% accuracy 6

Sampling algorithms • Utilize common structure of bursty tracing [Hirzel & Chilimbi 01] • Sampling rate r =|Is|/(|Is| +|IH|) • Naïve sampling • Turn off profiling during hibernating intervals • Non guarantee of accuracy 7

Naive sampling Naïve sampling: Inaccurate measurement Memory access trace: ① ② ⑤ ③ 1 ④ 3 . . cabcacabcda. . IH 8 IS IH IS

Biased sampling • • Ignore datumofthat hassampled been referenced within Probability being not uniform the current hibernating period • Measured distance always larger than or equal to actual distance • Probability of being sampled not uniform 9

Biased sampling: Memory access trace: ① ⑤ ② ③ ④ . . cabcafabcacabfda. . IH 10 IS IH IS

History-preserved representative sampling • Add an additional tag for each address in access trace • Mark references within a sampling period as sampled in the tag • Reuse will only be sampled when starting point marked sampled 11

History-preserved representative sampling: Memory access trace: ① ⑤ ② ③ ④ . . cabcafabcacabfda. . IH 12 IS IH IS

Further improvements • Simplifying maintenance in hibernating intervals • Reference trace implementation: splay tree [Ding & Zhong 03] • In sampling period, full tree maintenance • In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses • Fast sample tag marking and checking • To save space cost, we fix the length of sampling and hibernating period, avoid additional tag 13

Experiments • Benchmarks from SPEC 2006, Olden, Chaos: • Floating point programs: Cactus. ADM, Milc, Soplex, Apsi, Mol. Dyn • Integer programs: Bzip 2, Gcc, Libquatum, Perimeter, TSP • Instrumentation tool: Valgrind 3. 2. 3 • Sampling rate : 1% • We run each individual benchmark with 3 to 6 different inputs • Repeat three time for each input 14

Experiments cont’d • Comparison of accuracy and efficiency • Ding and Zhong ’s approximation method [Ding & Zhong 03] • Time distance measurement [Shen+07] • Implementation of four algorithms: • Naive sampling, biased sampling, basic and optimized representative sampling 15

Accuracy 16

Efficiency Sampling even outperforms the lower bound : time distance measurement 17 Generally, speedup is less when the input size is small

Efficiency • Speedup of basic representative sampling : around 4 -5 times for most cases • Speedup of optimized representative sampling: • around 7 -10 for most cases, up to 33 times • geometric mean is 7. 5 • Sampling rate effect (TSP): 18

Related work • Reuse signature collection • [Mattson+70] [Bennett & Kruskal 75] [Olken 81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07] • Selective monitoring • Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05] • Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07] • Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07] 19

Future work • Dynamically adjust sampling/hibernating lengths • Store references in temporary buffer and then process them in batch • Combine time sampling with data sampling 20

Thank you! Questions? 21