Program Slicing on Java bytecode for Locating Functional
Program Slicing on Java byte-code for Locating Functional Concerns Takashi Ishio† Ryusuke Niitani † Gail Murphy‡ Katsuro Inoue † Osaka University, Japan ‡ University of British Columbia, Canada † Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Concern Location n A functional concern is code that helps fulfill a functional requirement. p. A software maintenance task usually focuses on a functional concern. n Concern location comprises “Search and Explore. ” p Search n grep or other feature location tools p Explore n “interesting” methods the interaction among the methods call graph, class hierarchy tree, cross reference Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example: Autosave function in j. Edit periodically saves the contents of text area. p. A user can specify the frequency. n We can easily find Autosave class, Buffer. autosave() method and Buffer. IORequest. autosave() method. n How the classes and methods are interacting? Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Exploring Interaction among methods n Important information: control-flow and data-flow. p Which method triggers the autosave function. p Which class has a necessary data (e. g. filename). p How a method saves the contents to a text file. n We have to read following classes: Autosave, Buffer. IORequest, Perspecive. Manager, VFSManager, File. VFS … Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Automated Concern Location n We are trying to extract a concern graph from code fragments specified by a developer. p Our approach is based on program slicing. p Our tool is based on Soot, a Java bytecode analysis framework. Code fragments related to a functionality Program Slicing with Heuristics a program slice Slice-to-Concern. Graph Translation A concern graph Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Autosave concern graph Input = Autosave. *(), Buffer. autosave(), Buffer. IORequest. autosave() Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Program Slicing n Slicing extracts statements related to criteria statements specified by a user. 1. A program P is converted to a program dependence graph. n n 2. use vertices: statements in P edges: control/data dependence relations A user specifies “slicing criteria” statements in P. n 3. data dependence definition 1 i = 3; 2 if (a > 0) { control 3 print i; dependence <3, i> 4 } The statements are translated into “criteria vertices” in the PDG. A program slice, a set of statements that affect or depend on criteria, is extracted by graph traversal from criteria vertices. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slice including unrelated concerns n Slicing usually extracts many statements. p. A functional unit is connected to other units by control/data-flow. p 28% on average in C program† slicing Autosave activate reset set/reset autosave_dirty flag Undo. Manager set Complete. Word † Binkley, D. , Gold, N. and Harman, M. : An Empirical Study of Static Program Slice Size. ACM TOSEM Vol. 16, No. 2, Article 8, April 2007. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing with Barriers n. A barrier is a vertex or an edge that terminate graph traversal†. slicing Autosave activate reset A barrier blocks graph traversal. set/reset autosave_dirty flag Undo. Manager set Complete. Word † Krinke, J. : Slicing, Chopping, and Path Conditions with Barriers. Software Quality Journal, Vol. 12, No. 4, pp. 339 -360, December 2004. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Similarity-based Barrier n The key idea is following: if two methods are contributing to the same functionality, the methods use similar methods, fields and classes. Name Set NS(m) = a set of types, classes, methods and fields referred in m. A long name is “tokenized”. e. g. “java. io. File” “java”, “io”, “File”, “java. io. File” Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example of Similarity package org. gjt. sp. util; NS(Integer. Array. add) class Integer. Array { org. gjt. sp. util. Integer. Array, private int[] array; org, gjt, sp, util, integer, array, private int len; void, add, int, len, int[], public void add(int num) { java. lang. System, java, lang, system, if(len >= array. length) { arraycopy int[] array. N = new int[len * 2]; System. arraycopy(array, 0, array. N, 0, len); sim = 0. 639 array = array. N; NS(Integer. Array. get. Size) } org. gjt. sp. util. Integer. Array, array[len++] = num; org, gjt, sp, util, integer, array, } get. Size, get, size, int, len public final int get. Size() { return len; NS(Integer. Array. set. Size) sim = 0. 801 } org. gjt. sp. util. Integer. Array, public final void set. Size(int len) { org, gjt, sp, util, integer, array, len, this. len = len; set. Size, set, size, void, int }} Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Identifying Barriers n Program slicing is blocked at method m m is not related to slicing criteria Similarity(m, C) ≦ threshold C = a set of methods that contain slicing criteria vertices. p. A method m is related to slicing criteria if slicing criteria includes a method n such that m is similar to n. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University if
Slicing algorithm n Slicing with summary edges and barriers p defined by Horwitz p extended by Krinke n PDG based on Jimple code p “jimple” is an intermediate representation for bytecode. 3 -address code n Simple control-flow: “if” and “goto” n Independent of JVM stack operation n Code fragments related to a functionality Calculate similarity for each method Identify barriers Slicing with Barriers a program slice Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Visualizing a slice as a concern graph n Concern Graph A vertex is a class, a method or a field. p An edge represents a relation between two vertices. p call, create, check, read, write, superclass, … applied rule-based translation. † n n We Slice v 1 in m 1 Concern Graph v 2 in m 2 call m 2 m 1 call or parameter v 1 in m 1 read m 1 field READ obj. field † Kameda, D. and Takimoto, M. : Building Cocnern Graph Based on Program Slicing. IPSJ Transactions on Programming, Vol. 46, No. 11 (Pro 26), pp. 45 -56. in. Science, Japanese. Department of Computer Graduate School of Information Science & Technology, Osaka University
A graphical output with Graphviz n We omit intra-class edges in graphical format. p Detail is provided in textual format. e. g. “Autosave. set. Interval(interval) calls new Timer(interval, Autosave). ” Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
The effectiveness of barriers n Barriers reduced concern graph size: p 1000 methods 20 methods p Printable on an A 3 or A 4 -sized paper n Comparing extracted graphs with hand-made concern graphs (not finished yet). concern graph size on 6 maintenance tasks on j. Edit and our Slicer Our previous experiment is reported in: 仁井谷竜介,石尾隆,井上克郎: プログラムスライシングを用いた機能的 関心事の抽出手法の提案と実装. PPL 2007. in Japanese. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Information extracted from Java program n To construct a dependence graph p Control dependence relation p Data dependence relation p Call Graph (with dynamic binding information) n To identify barriers pa set of types, methods, fields referred in each method m n To slice the dependence graph p Mapping source code to vertices Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing Tool Overview Soot Framework (http: //www. sable. mcgill. ca/soot/) Java Class Files Jimple Translator Jimple 3 -Address Code SPARK Points-to Set Analysis Control-Flow Data-flow Analysis Call Graph Points-to Set Annotated Jimple PDG Constructor Slicing Criteria Concern Graph Slicer PDG Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Our effort to implement the system n The program size p PDG Construction: 2731 LOC (without comments) p Slicing: 9296 LOC (without comments) n n slicing algorithms, heuristic functions and concern graph translation We could implement the PDG construction phase in two weeks: p One week to understand how Soot works. p The other week to implement code. n Soot enabled us to focus on the essential part of the research idea. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Advantage of Soot üA rich analysis toolkit p Soot provides control-flow and data-flow for each method. p Jimple is simpler than source code and bytecode. n Complex Java statements are simplified during compilation. Control-flow Exceptional Unit. Graph Data-flow use Method Body use Smart. Local. Defs 1 n Unit 1 n Value is-a Stmt (Jimple code) is-a Expr Local Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Limitation of Soot û Soot is not a program analysis framework. p Soot keeps all data in memory to compile Jimple code to bytecode after the optimization. Soot requires 2 -4 GB RAM to analyze j. Edit and JDK. p Soot supports only the simple workflow: whole program analysis (call-graph construction) followed by local program analysis. n We cannot implement a statistics tool (whole-program analysis) that uses the result of method-local analysis. Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Summary n Concern location based on program slicing p We introduced heuristics in order to extract a functional concern of interest to a developer. n Input is the same as a traditional program slicing. p Most n of graphs can be printed on an A 3 -sized paper. Soot framework reduced the implementation effort. p Soot is a good framework, but we hope a framework specialized for program analysis. n easy-to-learn, extensible and scalable Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
- Slides: 22