Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv

  • Slides: 35
Download presentation
Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta The University of Arizona 1

Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta The University of Arizona 1

Program Slicing Definition • Slice(v@S) Slice of v at S is the set of

Program Slicing Definition • Slice(v@S) Slice of v at S is the set of statements involved in computing v ’s value at S. [Mark Weiser, 1982] Static slice is the set of statements that COULD influence the value of a variable for ANY input. • Construct static dependence graph § § • Control dependences Data dependences Traverse dependence graph to compute slice § Transitive closure over control and data dependences 2

Dynamic Slicing Dynamic slice is the set of statements that DID affect the value

Dynamic Slicing Dynamic slice is the set of statements that DID affect the value of a variable at a program point for ONE specific execution. [Korel and Laski, 1988] • Execution trace § § • • • control flow trace -- dynamic control dependences memory reference trace -- dynamic data dependences Construct a dynamic dependence graph Traverse dynamic dependence graph to compute slices Smaller, more precise, slices are more helpful 3

Slice Sizes: Static vs. Dynamic Program Statements Avg. of 25 slices Static / Dynamic

Slice Sizes: Static vs. Dynamic Program Statements Avg. of 25 slices Static / Dynamic 126. gcc 585, 491 51, 098 6, 614 7. 72 099. go 95, 459 16, 941 5, 382 3. 14 134. perl 116, 182 5, 242 765 6. 85 130. li 31, 829 2, 450 206 11. 89 008. espresso 74, 039 2, 353 350 6. 72 Static slice can be much larger than the dynamic slice 4

Applications of Dynamic Slicing q Debugging q Detecting Spyware • q q [N. Gupta

Applications of Dynamic Slicing q Debugging q Detecting Spyware • q q [N. Gupta & Rao - 2001] Guide program structuring Performance Enhancing Transformations • • q [Duesterwald, Gupta, & Soffa - 1992] Dependence based structural testing - output slices. Module Cohesion • [Jha - 2003] Installed without users’ knowledge Software Testing • q [Korel & Laski - 1988] Instruction criticality [Ziles & Sohi - 2000] Instruction isomorphism [Sazeides - 2003] Others… 5

The Graph Size Problem 300. twolf Statements Executed (Millions) 140 Dynamic Dependence Graph Size(MB)

The Graph Size Problem 300. twolf Statements Executed (Millions) 140 Dynamic Dependence Graph Size(MB) 1, 568 256. bzip 2 67 1, 296 255. vortex 108 1, 442 program 197. parser 123 1, 816 runs do not 181. mcf 118 1, 535 fit in 164. gzip 71 835 memory. 134. perl 220 1, 954 130. li 124 1, 745 126. gcc 131 1, 534 099. go 138 1, 707 Program Graphs of realistic 6

Space and Time Cost of LP [ICSE 2003] 300. twolf Slicing Time Average (Minutes)

Space and Time Cost of LP [ICSE 2003] 300. twolf Slicing Time Average (Minutes) 13. 9 Max. Dynamic Dependence Graph Size(MB) 296 256. bzip 2 9. 2 81 Still not 255. vortex 10. 2 34 fast 197. parser 9. 9 40 enough. 181. mcf 12. 3 114 Need to 164. gzip 4. 69 35 keep graph 134. perl 25. 2 54 in memory. 130. li 11. 3 105 126. gcc 12. 1 58 099. go 10. 7 162 Program 7

Dependence Graph Representation Input: N=2 1: 2: 3: 4: 5: 6: then 7: z=0

Dependence Graph Representation Input: N=2 1: 2: 3: 4: 5: 6: then 7: z=0 a=0 b=2 p=&b for i = 1 to N do if ( i %2 == 0) p=&a endif 8: a=a+1 9: z=2*(*p) endfor 10: print(z) 11: z=0 21: a=0 31: b=2 41 : p=&b 51: for I=1 to N do 61: if (i%2==0) then 81 : a=a+1 91 : z=2*(*p) 52: for I=1 to N do 62: if (i%2==0) then 71 : p=&a 82 : a=a+1 92 : z=2*(*p) 101: print(z) 8

Dependence Graph Representation T Input: N=2 1 2 3 4 5 6 7 8

Dependence Graph Representation T Input: N=2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 11: z=0 21: a=0 31: b=2 41: p=&b 51: for i = 1 to N do 61: if ( i %2 == 0) then 81: a=a+1 91: z=2*(*p) 52: for i = 1 to N do 62: if ( i %2 == 0) then 71: p=&a 82: a=a+1 92: z=2*(*p) 101: print(z) 1: z=0 2: a=0 3: b=2 <2, 7> <3, 8> 4: p=&b 5: for i=1 to N <5, 6><9, 10> F <4, 8> T 6: if (i%2==0) then <10, 11> T 7: p=&a <11, 13> F <5, 7><9, 12> <7, 12> 8: a=a+1 <12, 13> <5, 8><9, 13> 9: z=2*(*p) <13, 14> 10: print(z) 9

OPT: Compacted Graph Algorithm q Compaction • Elimination of timestamp labels. v v v

OPT: Compacted Graph Algorithm q Compaction • Elimination of timestamp labels. v v v q Remove labels that can be inferred Transform dependence graph to enable elimination Remove labels that are redundant Fast Traversal • Long search for relevant dependence is often replaced quick computation of dependence v by Consequence of compaction 10

OPT-1 a. Infer Local Def-Use Labels: Full Elimination Assign timestamps on node level X=

OPT-1 a. Infer Local Def-Use Labels: Full Elimination Assign timestamps on node level X= X= X= (10, 10) 0 (20, 20) (30, 30) =X =X =X 11

OPT-1 b. Infer Local Def-Use Labels: Partial 12

OPT-1 b. Infer Local Def-Use Labels: Partial 12

OPT-2 a. Transform Local Def-Use Labels: Full Elimination In Presence of Aliasing Z= Z=

OPT-2 a. Transform Local Def-Use Labels: Full Elimination In Presence of Aliasing Z= Z= Y= Y= X = f(Y) =X (20, 21) X = f(Y) (11, 11) *P = g(Z) =X Z= Y= (10, 11) (20, 21) *P = g(Z) (10, 11) (20, 21) (10, 11) X = f(Y) *P = g(Z) =X =X (21, 21) 0 0 13

OPT-2 b. Transform Non-local Def-Use to Local Use-Use Edges X= X= X= (10, 11)

OPT-2 b. Transform Non-local Def-Use to Local Use-Use Edges X= X= X= (10, 11) (20, 21) (10, 11) =X =X (20, 21) =X =X 0 use-use 14

OPT-2 c. Transform Non-Local Def-Use to Local Def-Use Edges X= (1, 3) X= 1

OPT-2 c. Transform Non-Local Def-Use to Local Def-Use Edges X= (1, 3) X= 1 (10, 12) 1 2 (1, 3) 1 Y= Y= Y= 2 Y= X= 2 Y= (11, 12) (2, 3) Y= Node for path (2, 3) X= =Y 2 =X Y= =Y =Y =Y =X =X =X 0 0 15

OPT-3. Redundant Labels Across Non-Local Def-Use Edges X= X= Y= Y= =Y =X (10,

OPT-3. Redundant Labels Across Non-Local Def-Use Edges X= X= Y= Y= =Y =X (10, 11) X= (1, 2) Y= (1, 2) X= (1, 2) Y= (10, 11) =Y =Y =X =X 16

OPT-4. (Control Dep. ) Infer Fixed Distance Unique Control Ancestor 1 (10, 11) 1

OPT-4. (Control Dep. ) Infer Fixed Distance Unique Control Ancestor 1 (10, 11) 1 (20, 21) (30, 31) 2 3 4 Path Timestamps 1. 2. 3. 5 1. 2. 4. 5 1. 2. 3. 4. 5 10. 11. 12. 13 20. 21. 22. 23 30. 31. 32. 33. 34 (11, 12) 1 (31, 32) 1 2 3 (10, 13) (21, 22) (20, 23) (30, 34) (32, 33) 4 5 5 17

OPT-5 a. Transform Multiple Control Ancestors 1 1 1 2 1 1 3 3

OPT-5 a. Transform Multiple Control Ancestors 1 1 1 2 1 1 3 3 4 2 2 (32, 33) 4 (10, 13) (21, 22) (20, 23) (30, 34) 3 1 (10, 13) (30, 34) 1 4 2 4 5 5 5 0 0 0 5 18

OPT-5 b. Transform Varying Distance to Unique Control Ancestors 1 1 1 2 2

OPT-5 b. Transform Varying Distance to Unique Control Ancestors 1 1 1 2 2 1 3 3 4 1 4 5 5 3 1 2 0 3 0 4 0 0 5 19

OPT-6. Redundant Across Non-Local Def. Use and Control Dependence Edges X= If P =X

OPT-6. Redundant Across Non-Local Def. Use and Control Dependence Edges X= If P =X X= X= If P (1, 2) =X =X 20

Completeness of Label Elimination Optimizations q Data Dependence Labels • Local to a basic

Completeness of Label Elimination Optimizations q Data Dependence Labels • Local to a basic block v v Infer (OPT-1 a, OPT-1 b) Transform (OPT-2 a) • Non-Local across basic blocks v v q Transform (OPT-2 b, OPT-2 c) Redundant (OPT-3) Control Dependence Labels v v v Infer (OPT-4) Transform (OPT-5 a, OPT-5 b) Redundant (OPT-6) 21

Slicing algorithm (1) Slice(v, s 1) @ t = {s 2} U Slice(x, s

Slicing algorithm (1) Slice(v, s 1) @ t = {s 2} U Slice(x, s 2) @ t … 0 0 s 2: x= … s 1: v=f(x, …) 22

Slicing algorithm (2) Slice(v, s 1) @ t = Slice(x, s 2) @ t

Slicing algorithm (2) Slice(v, s 1) @ t = Slice(x, s 2) @ t … 0 Use-use edge 0 s 2: …=x … s 1: v=f(x, …) 23

Slicing algorithm (3) Slice(v, s 1) @ t = {s 3} U Slice(x, s

Slicing algorithm (3) Slice(v, s 1) @ t = {s 3} U Slice(x, s 3) @ t’ … … s 3: x=… s 4: x=… …<t’, t>… … s 1: v=f(x, …) 24

Shortcuts to Speed Up Traversal 0: X = 1: Y = f(X) 2: Z

Shortcuts to Speed Up Traversal 0: X = 1: Y = f(X) 2: Z = g(Y) 3: … = Z (10, 11) (20, 21) 0 0 0: X = (10, 11) (20, 21) 1: Y = f(X) 2: Z = g(Y) 0 {2} 3: … = Z 25

Experimental Setup q Implementation • • • q Trimaran: C programs, IR (intermediate representation)

Experimental Setup q Implementation • • • q Trimaran: C programs, IR (intermediate representation) An instrumented interpreter executes IR, collects compact control flow trace and memory trace. CFG and PDG are constructed on IR level so that the slicing is also on IR level. Experiment • • • In order to get fair comparisons among algorithms, we shared as much code as possible in different implementations. 2. 2 GHz Pentium, 2 G RAM, 1 G swap space. For each benchmark, we collected 3 different traces, each trace, we randomly computed 25 slices. for 26

OPT: Compacted Graph Sizes Graph Size (MB) Before / Before After 300. twolf 1,

OPT: Compacted Graph Sizes Graph Size (MB) Before / Before After 300. twolf 1, 568 210 7. 72 256. bzip 2 1, 296 51 25. 68 3. 89 255. vortex 1, 442 65 22. 26 4. 49 197. parser 1, 816 70 26. 03 3. 84 181. mcf 1, 535 170 9. 02 11. 09 164. gzip 835 52 16. 19 6. 18 134. perl 1, 954 21 93. 40 1. 07 130. li 1, 745 97 18. 09 5. 53 126. gcc 1, 534 75 20. 54 4. 87 099. go 1, 707 131 13. 01 7. 69 Program Explicit Dependences (%) 13. 40 27

OPT: Effects 28

OPT: Effects 28

OPT: Slicing Times at Different Execution Points 29

OPT: Slicing Times at Different Execution Points 29

OPT: Benefit of Shortcuts Program 300. twolf OPT Slicing Times (Avg. of 25 slices)

OPT: Benefit of Shortcuts Program 300. twolf OPT Slicing Times (Avg. of 25 slices) W/O Shortcuts With Shortcuts (Seconds) 68. 0 36. 3 256. bzip 2 6. 1 255. vortex 5. 6 1. 9 197. parser 4. 9 2. 2 181. mcf 22. 0 17. 1 164. gzip 4. 5 1. 7 134. perl 12. 6 4. 1 130. li 15. 7 6. 1 126. gcc 9. 8 3. 8 099. go 26. 9 11. 4 30

OPT vs. LP: Graph Sizes Program Graph Size (MB) OPT LP (Max. of 25)

OPT vs. LP: Graph Sizes Program Graph Size (MB) OPT LP (Max. of 25) 300. twolf 210 296 256. bzip 2 51 81 255. vortex 65 35 197. parser 70 40 181. mcf 170 113 164. gzip 52 35 134. perl 21 54 130. li 97 105 126. gcc 75 57 099. go 131 162 31

OPT vs. LP: Slicing Times Program 300. twolf Slicing Times (Avg. of 25 slices)

OPT vs. LP: Slicing Times Program 300. twolf Slicing Times (Avg. of 25 slices) OPT LP (Seconds) (Minutes) 36. 3 13. 9 256. bzip 2 2. 1 9. 2 255. vortex 1. 9 10. 2 197. parser 2. 2 9. 9 181. mcf 17. 1 12. 3 164. gzip 1. 7 4. 7 134. perl 4. 1 25. 2 130. li 6. 1 11. 3 126. gcc 3. 8 12. 1 099. go 11. 4 10. 7 32

Traditional vs. OPT: Short Program Runs Program 300. twolf Slicing Times (Avg. of 25

Traditional vs. OPT: Short Program Runs Program 300. twolf Slicing Times (Avg. of 25 slices) OPT Traditional (Seconds) 36. 3 : 68. 0 66. 0 256. bzip 2 2. 1 : 6. 1 5. 9 255. vortex 1. 9 : 5. 6 6. 2 197. parser 2. 2 : 4. 86 5. 3 181. mcf 17. 1 : 22. 0 21. 7 164. gzip 4. 5 : 1. 7 4. 8 134. perl 4. 1 : 12. 6 - 130. li 6. 1 : 15. 7 17. 9 126. gcc 3. 8 : 9. 8 11. 0 099. go 11. 4 : 26. 9 29. 8 33

Graph Construction Cost • • Trace Generation - Instrumented program takes twice as long

Graph Construction Cost • • Trace Generation - Instrumented program takes twice as long to run as the uninstrumented program. Trace Preprocessing for Graph Construction Time(LP) < Time(OPT) < Time(Traditional) Program LP (min) OPT (min) Trad. (min) 300. twolf 14. 54 65. 29 99. 62 256. bzip 2 9. 38 38. 36 80. 78 255. vortex 16. 35 44. 46 55. 47 197. parser 16. 23 44. 06 67. 57 181. mcf 16. 64 53. 64 71. 17 164. gzip 14. 56 23. 52 31. 66 134. perl 17. 18 51. 12 - 130. li 19. 23 49. 88 74. 86 126. gcc 26. 65 48. 83 52. 70 099. go 17. 06 35. 24 42. 17 34

Conclusion q A straightforward implementation of precise algorithm is not practical. q Carefully designed

Conclusion q A straightforward implementation of precise algorithm is not practical. q Carefully designed precise dynamic slicing algorithms provide precise dynamic slices at reasonable space and time costs. q Our work is one step toward making dynamic slicing practical. • On going work: Efficient online compression another 5 -10 times reduction; 15 MB for 150 Mills(over 100 times reduction in total); 4 -10 times slowdown. 35