New Results in Program Slicing Aharon Abadi Ran
New Results in Program Slicing Aharon Abadi, Ran Ettinger, and Yishai Feldman IBM Haifa Research Lab 1
Context • The Programmer’s Apprentice – The Plan Calculus • • Bogart Midas Sliding Painless – Paz – Aderet 2
Improving Slice Accuracy by Compression of Data and Control Flow Paths Presented at ESEC/FSE 2009 3
Program Slicing Program Slice Start Slice x : = exp The same sequence of values 4
Control-Flow Path Compression test X if-zero-go-to A. . . L: test Y if-zero- go-to B. . . go-to L A: Z 0. . . B: Work in two stages: - Compute the ‘traditional’ slice - Control dependences - Data Dependences - Compute the necessary branches to prevent infeasible control paths 5
Control-Flow Path Compression test X if-zero-go-to A . . . go-to B A: Z 0 L: test Y if-zero- go-to B. . . go-to L A: Z 0. . . B: Limitations of previous approaches: - insert all the loop; - add branches not from the program; or - do not preserve behavior This algorithm: - preserves behavior - yields a sub-program - one version may turn conditional branches into unconditional ones (“rhetorization”) 6
Data-Flow Path Compression Start: R 2: =0 R 7: =exp 1 Loop: R 2: =R 2 + 1 compare R 2, R 9 R 7: =exp 1 if-not-less-go-to Out Out: R 0: =R 7 + 1 Temp: =R 7 use R 7: =Temp: =R 7; spill R 7 to memory go-to Loop Out: R 0: =R 7 + 1 … ; code that uses R 7: =exp 1 ; all registers R 7: =Temp; restore R 7 go-to Loop Out: R 0: = R 7 + 1 The result is too large The value of R 7 does not depend on the loop Previous syntax-preserving algorithms insert the loop and the assignments inside it 7
Control-Flow Path Compression if (x<11) goto A 4 F x : = x+1 goto A 2 A 4: if (x<9) goto A 3 x : = x-1 A 1: if (y<T) goto A 2 y : = y– 1 x<11 T goto A 4 x: =x+1 T goto A 2 x<9 F goto A 3 x: =x+2 x: =x-1 goto A 1 T A 3: x : = x+2 y<T F goto A 2: print(x) y: =y-1 goto A 8
Compute the ‘Traditional’ Slice if (x<11) goto A 4 F x : = x+1 goto A 2 A 4: if (x<9) goto A 3 x : = x-1 A 1: if (y<T) goto A 2 y : = y– 1 x<11 T goto A 4 x: =x+1 T goto A 2 x<9 F goto A 3 x: =x+2 x: =x-1 goto A 1 T A 3: x : = x+2 y<T F goto A 2: print(x) y: =y-1 goto A 9
Completing Control Flow Paths: Main Lemma All paths from the same point in the slice enter the slice at a single point • precisely identifies the possible sets of branches that may be added to the slice • any path in the original program can be chosen • optimizations can be performed 10
Compute the Necessary Branches F if(x<11) goto A 4 x: =x+1 goto A 2 A 4: if(x<9) goto A 3 x: =x-1 A 1: if(y<T) goto A 2 y: =y– 1 x<11 T x: =x+1 goto A 4 goto A 2 T x<9 F goto A 3 x: =x+2 x: =x-1 goto A 1 T A 3: x: =x+2 y<T F goto A 2: print(x) y: =y-1 goto A 11
Data-Flow Path Compression Start: R 2: =0 R 7: =exp 1 Loop: R 2: =R 2 + 1 compare R 2, R 9 if-not-less-go-to Out use R 7 Temp: =R 7; spill R 7 to ; memory … ; code that uses ; all registers R 7: =Temp; restore R 7 go-to Loop Out: R 0: =R 7 + 1 R 2: =0 R 7: =exp 1 R 2: =R 2+1 compare R 2, R 9 if-not-less use R 7 go-to Out Temp: =R 7 R 7: =exp 1 Out: R 0: =R 7 + 1 +1 R 0: =R 7+1 R 7: =Temp goto Loop exit 12
Data-Flow Path Compression Start: R 2: =0 R 7: =exp 1 Loop: R 2: =R 2 + 1 compare R 2, R 9 if-not-less-go-to Out use R 7 Temp: =R 7; spill R 7 to ; memory … ; code that uses ; all registers R 7: =Temp; restore R 7 go-to Loop Out: R 0: =R 7+1 R 2: =0 0 d 1 d 2 R 7: =exp 1 d 1 R 2: =R 2+1 ++ in data port holds the next value compare R 2, R 9 out data port holds the last value if-not-less use R 7 go-to Out Temp: =R 7 • The Plan Calculus: The Programmer’s Apprentice, Rich and Waters, 1990 • R 7, Temp carry the value of exp 1 R 7: =R 7+1 ++ R 7: =Temp exit • Use data edges instead of variables goto-Loop 13
Start: R 2: =0 R 7: =exp 1 Loop: R 2: =R 2 + 1 compare R 2, R 9 if-not-less-go-to Out use R 7 Temp: =R 7; spill R 7 to ; memory … ; code that uses ; all registers R 7: =Temp; restore R 7 go-to Loop Out: R 0: =R 7 + 1 entry exp 1 0 R 2 R 9 R 2 ++ R 7 compare R 2, R 9 if-not-less R 7: = exp 1 R 0: =R 7 + 1 T F use R 7 ++ R 0 exit 14
Decompression Start: R 2: =0 R 7: =exp 1 Loop: R 2: =R 2 + 1 compare R 2, R 9 if-not-less- go-to Out use R 7 Temp: =R 7; spill R 7 to ; memory … ; code that uses ; all registers R 7: =Temp; restore R 7 go-to Loop Out: R 0: =R 7 + 1 entry exp 1 0 R 2 R 9 R 2 ++ R 7 compare R 2, R 9 if-not-less T R 7: =exp 1 F use R 7 go-to Out: R 0: =R 7 + 1 ++ R 0 exit 15
Properties of the Slices • • Syntax preserving, possibly rhetorizing Behavior preserving Executable For structured programs – At least as accurate as previous algorithms – Strictly smaller in interesting cases • For unstructured programs – Empirically shown to be superior – Modification of the algorithm guaranteed at least as accurate 16
Implementation • A family of slicing algorithms – rhetorizing (*RB, *RM) – strictly syntax-preserving (*PB, *PM) – amorphous (*AB, *AM) • adds new branches (not from the program) test X if-zero-go-to A. . . goto A 2 A 1: if(y<T) L: test Y if-zero-go-to B. . . go-to L B: go-to C go-to exit A: Z 0 C: . . . goto exit 17
Empirical Study • Corpus of 15 manually-written assemblylanguage modules from a large mainframe product • 8578 non-comment source lines • Computed slices from all lines • 5801 non-empty slices 18
Empirical Results Effect of %slices better %average decrease %slices worse %average decrease Rhetorization 17 7. 5 Lenient BH 30 17 Strict BH 94 65 implemented 1 24 8 15 modified Control path compression Data path compression 19
Related Work Behavior Subset of the original program (for flat languages) Comparison to traditional algorithm on structured programs Preserve behavior May add infinite loops Not executable BH, CF 1, Ag, HLB, *P, *R, *A HLB, HD KH Syntaxpreserving Rhetorizing Amorphous BH, CF 1, Ag, HD, HLB, *P *R HLB, CF, *A Smaller than traditional Equal to traditional Larger than traditional *P, *R, *A BH, CF 1, Ag, HD, KH, HLB, CF 2 BH: Ball & Horwitz 1993 CF: Choi & Ferrante 1994 Ag: Agrawal 1994 KH: Kumar & Horwitz 2002 HD: Harman & Danicic 1998 HLB: Harman, Lakhotia & Binkley 2006 20
Conclusions • Two techniques for reducing slice size – Control-Flow Path Compression • Precise identification of all correct solutions • Shortest paths significantly improve slice accuracy – 17 -22% improvement for 30 -37% of the cases – Data-Flow Path Compression • Eliminates copy assignments • Yields significant improvement in a few cases – 24% improvement for 1% of the slices computed • Strictly smaller even for structured programs 21
Fine Slicing for Program Transformation 22
Refactoring’s Rubicon: Extract Method • Automating Extract Method is Refactoring’s Rubicon (Fowler*) – The one that demonstrates “serious tool support” – Precondition for many other transformations • This Rubicon has not yet been crossed – Getting it right requires more analysis capability than is available in current IDEs *http: //www. martinfowler. com/articles/refactoring. Rubicon. html 23
Fowler’s Example (website( void print. Owing() { print. Banner(); //print details System. out. println("name: " + _name); System. out. println("amount " + get. Outstanding()(; } void print. Owing() { print. Banner(); print. Details(get. Outstanding()); } void print. Details(double outstanding) { System. out. println("name: " + _name); System. out. println("amount " + outstanding); } 24
A Case Study in Enterprise Refactoring • Converted a Java Servlet to use the MVC pattern* • Used as much automated support as available – The whole conversion could be described as a series of cataloged (“small”) refactorings – Most steps were inadequately supported by the IDE – Some were not supported at all Based on Alex Chaffee’s “Refactoring to Model-View-Controller” article (http: //www. purpletech. com/articles/mvc/refactoring-to-mvc. html) * 25
Case-Study: Automation (1( Fully Supported Refactorings Uses Extract Method 3 Extract Temp 3 (Self) Encapsulate Field 2 Replace Magic Number with Symbolic Constant 1 Inline Temp 1 Extract Superclass 1 Delete Methods 1 Move Method 1 Total 13 26
Case-Study: Automation (2( Partial(*) or No(**) Support Extract Method * Substitute Expression ** Replace Temp with Query * Replace Method with Method Object ** Substitute Statement ** Extract Class * Move Statement (or Swap Statements) ** Uses 10 5 3 2 1 1 1 Total 23 27
Currently Unsupported Cases of Extract Method (a) Extract multiple fragments (b) Extract a partial fragment – select sub-expressions as parameters (c) Extract loop with partial body – loop duplication with data flow (d) Extract code with conditional exits Program slicing pulls related code together! 28
slice (v. ): to cut with or as if with a knife slice (n. ): a thin flat piece cut from something Merriam-Webster 29
A (backward) slice of a given program with respect to selected “interesting” variables is a subprogram that computes the same values as the original program for the selected variables A (backward) fine slice of a given program with respect to selected “interesting” variables and other “oracle” variables is a subprogram that computes the same values as the original program for the selected variables, given values for the oracle variables 30
Fine Slicing • A generalization of traditional program slicing • Fine slices can be precisely bounded – Slicing criteria include set of data and control dependences to ignore • • Fine slices are executable and extractable Complement slices (co-slices) are also fine slices Oracle-based semantics for fine slices Algorithm for computing data-structure representing the oracle • Forward fine slices are executable, may be slightly larger than traditional forward slices • Confines generalize blocks for unstructured programs 31
Extract Computation • • A new refactoring Extracts a fine slice into contiguous code Computes the co-slice Computation can then be extracted into a separate method using Extract Method • Passes necessary “oracle” variables between slice and co-slice • Generates new containers if series of values need to be passed 32
)a) Extract multiple fragments User user = get. Current. User(request); if (user == null) { response. send. Redirect(LOGIN_PAGE_URL); return; } response. set. Content. Type("text/html"); disable. Cache(response); String album. Name = request. get. Parameter("album"); Print. Writer out = response. get. Writer(); 33
)b) Extract a partial fragment out. println(DOCTYPE_HTML); out. println("<html>"); out. println("<head>"); out. println("<title>Error</title>"); out. println("</head>"); out. print("<body><p class='error'>"); out. print("Could not load album '" + album. Name + "'"); out. println("</p></body>"); out. println("</html>"); 34
)c) Extract loop with partial body 1 2 3 4 5 6 7 8 9 10 out. println("<table border=0>"); int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size()); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); print. Picture(out, picture); } out. println("</table>"); 35
2 3 4 5 *** 6 7 *** 9 1 6 8 10 int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size()); Queue<Picture> pictures = new Linked. List<Picture>(); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); pictures. add(picture); } out. println("<table border=0>"); for (int i = start; i < end; i++) print. Picture(out, pictures. remove()); out. println("</table>"); 36
(d) Extract code with conditional exits if (album == null) { new Error. Page("Could not load album '" + album. get. Name() + "'"). print. Message(out); return; } //. . . 37
if (invalid. Album(album, out)) return; } //. . . boolean invalid. Album(Album album, Print. Writer out) { boolean invalid = album == null; if (invalid) { new Error. Page("Could not load album '" + album. get. Name() + "'"). print. Message(out); } return invalid; } 38
entry out. println("<table border=0>"); int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size ()); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); print. Picture(out, picture); } out. println("</table>"); "<table border=0>" out println page Token Semantics 20 * start + out album out get. Pictures i size end min get. Picture p 2 p 1 end "</table>" out i out p 2 p 1 < print. Picture T println F exit ++ 39
entry out. println("<table border=0>"); int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size ()); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); print. Picture(out, picture); } out. println("</table>"); "<table border=0>" out println page Fine Slicing 20 * start + out album out get. Pictures i size end min get. Picture end "</table>" out i out < print. Picture T println F exit ++ 40
entry out. println("<table border=0>"); for (int i = start; i < end; i++) { print. Picture(out, picture); } out. println("</table>"); "<table border=0>" out println start picture end The Fine Slice out i "</table>" out i out < print. Picture T println F exit ++ 41
entry out. println("<table border=0>"); int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size ()); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); print. Picture(out, picture); } out. println("</table>"); "<table border=0>" out println page Co-Slicing 20 * start + out album out get. Pictures i size end min get. Picture end "</table>" out i out < print. Picture T println F exit ++ 42
The Co. Slice entry int start = page * 20; int end = start + 20; end = Math. min(end, album. get. Pictures(). size ()); for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); } page 20 * start + album get. Pictures i size end min get. Picture i start picture out < end T F exit ++ 43
entry Co-slice Fine slice page entry 20 "<table border=0>" * out start println + end start picture album get. Pictures i end size min get. Picture picture i i start out < "</table>" < print. Picture T ++ end F T println out exit F out ++ exit 44
out. println("<table border=0>"); entry "<table border=0>" int start = page * 20; out int end = start + 20; println end = Math. min(end, album. get. Pictures(). size ()); page 20 Queue<Picture> pictures = new Linked. List<Picture>(); for (int i = start; i < end; i++) { * Picture picture = album. get. Picture(i); start pictures. add(picture); print. Picture(out, pictures. remove()); } + out album out. println("</table>"); out Adding a Container get. Pictures i pictures size end min get. Picture new pictures add end pictures remove "</table>" out i out picture < print. Picture T println F exit ++ 45
entry void display(Print. Stream out, int start, int end, Queue<Picture> pictures){ out. println("<table border=0>"); for (int i = start; i < end; i++) { print. Picture(out, pictures. remove()); } out. println("</table>"); } "<table border=0>" out The Fine Slice println end start pictures out i pictures "</table>" remove out i out picture < print. Picture T println F exit ++ 46
out. println("<table border=0>"); entry "<table border=0>" int start = page * 20; out int end = start + 20; println end = Math. min(end, album. get. Pictures(). size ()); page 20 Queue<Picture> pictures = new Linked. List<Picture>(); for (int i = start; i < end; i++) { * Picture picture = album. get. Picture(i); start pictures. add(picture); print. Picture(out, pictures. remove()); } + out album out. println("</table>"); out Program with Container get. Pictures i pictures size end min get. Picture new pictures add end pictures remove "</table>" out i out picture < print. Picture T println F exit ++ 47
int start = page * 20; entry int end = start + 20; end = Math. min(end, album. get. Pictures(). size ()); Queue<Picture> pictures = new Linked. List<Picture>(); page for (int i = start; i < end; i++) { Picture picture = album. get. Picture(i); pictures. add(picture); } display(out, start, end, pictures); out The Co. Slice 20 * start + album get. Pictures start i pictures size end min get. Picture new pictures picture add pictures end pictures i < display T F out exit ++ 48
Conclusions • Fine slicing algorithm yields executable slices whose boundaries can be precisely controlled • Can be used to make any subset of a program executable by adding some control structures but not the data on which they depend – including forward slices, thin slices, barrier slices, chops, and barrier chops – Conjecture: the size of these executable programs will not be substantially larger 49
Conclusions • New Extract Computation refactoring is an important step towards the automation of Extract Method in difficult cases – Enables the automation of big refactorings from smaller building blocks • Uses new fine-slicing algorithm • Automatically computes complement slices (co-slices) • Automatically generates containers to pass series of values if necessary 50
Related Work (I): Non-Executable Slices • Traditional backward slicing (e. g. , Weiser [ICSE 81] or Ottenstein & Ottenstein [PSDE 84]), when applied to unstructured code – Solved by path-completion stage in plan-based slicing (Abadi, Ettinger & Feldman [FSE 09]) • Forward slicing (Horwitz, Reps & Binkley, [TOPLAS 90]) • Barrier slicing (Krinke [SCAM 03]) • Chopping (Jackson & Rollins [FSE 94]) and Barrier Chopping (Krinke [SCAM 03]) • Thin slicing (Sridharan, Fink & Bodik [PLDI 07]) • All the above can be made executable with an appropriate oracle, by adding the required control structure 51
Related Work (II): Executable Slices with Reduced Scope or Size • Block-based slicing (Maruyama [SSR 01]): structured code only, no correctness proof • Co-slicing (Ettinger's thesis, Oxford 2006): limited to slicing from the end and oracle of final values only; proof on toy language • Parametric slicing (Field, Ramalingam & Tip [POPL 95]): an executable generalization of static and dynamic slices; like oracle semantics, they formalize programs with holes; however, their holes stand for expressions whose values are irrelevant, while our holes stand for significant (oracle) values • Some forms of dynamic and forward slicing are executable (Binkley et al. [SCAM 04]): forward slices made excessively large through the addition of backward slices 52
Related Work (III): Behavior. Preserving Procedure Extraction • Contiguous code – Bill Opdyke's thesis (UIUC 1992): for C++ – Griswold and Notkin [To. SE 93]: for Scheme • Arbitrary selections – Tucking (Lakhotia & Deprez [IST 98]): the complement is a slice too; no dataflow from the extracted slice to its complement yields over-duplication; strong preconditions (e. g. , no global variables involved, and no live-on-exit variable defined in both the slice and complement) – Semantics-Preserving Procedure Extraction (Komondoor & Horwitz [POPL 00]): considers all permutations of selected and surrounding statements; no duplication allowed; not practical (exponential time complexity); very strong preconditions – Effective Automatic Procedure Extraction (Komondoor & Horwitz [IWPC 03]): improves on their previous algorithm by improving complexity (cubic time and space), allowing some duplication (of conditionals and jumps); might miss some correct permutations; no duplication of assignments or loops; allows dataflow from complement to extracted code and from extracted code to (the second portion of the) complement; supports extraction of returns – Extraction of block-based slices (Maruyama [SSR 01]): extracts a slice of one variable only; restricted to structured code; no proof given – Ettinger's thesis (Oxford 2006): sliding transformation sequentially composes a slice and its complement, allowing dataflow from the former to the latter; supports loop untangling and duplication of assignments; restricted to slicing from the end, and only final values from the extracted slice can be reused in the complement; proof for toy language 53
- Slides: 53