Angelix Scalable Multiline Program Patch Synthesis via Symbolic

Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis IEEE/ACM ICSE 2016 Sergey Mechtaev, Jooyong Yi, Abhik Roychoudhury National University of Singapore Software Engineering Laboratory Dept. of Computer Science G 201792004 Youngjun Jeong

Contents 1 Introduction 2 Motivating Example 3 Background 4 Methodology 5 Experimental Results 6 Conclusion 2

1. 0 Introduction - Angelix • A novel semantic-based repair method • Is more scalable than previously proposed semantics based repair methods such as Sem. Fix and Direct. Fix • In their experiments, Angleix generated repairs from largescale real-world software such as wireshark and php, and these generated repairs include multi-location repairs 3

1. 1 Introduction – Two Main Repair Types • Search-based methodology • Gen. Prog, PAR, SPR • Also kwown as generate-and-validate methodology • Semantic-based methodology • Sem. Fix, Nopol, Direct. Fix • Via symbolic execution and constraint solving 4

1. 1 Introduction – Two Main Repair Types • Gen. Prog is shown to be scale to large-scale real-world software such as php and wireshark • Sem. Fix [26], the first semantic-based repair tool, is shown to be more efficient than Gen. Prog • Considering Attributes • Scalability: should scale to large real-world programs • Repairability: should repair a large number of defects possibly by covering many defect classes • Quality: should produce repairs which make less changes 5

1. 1 Introduction – Two Main Repair Types • Search-based repair: High scalability and low quality • Semantic-based repair: Low scalability and high quality • Semantic-based repair methods often work by extracting a repair constraint • To guide program synthesis • In Angelix, angelic forest 6

1. 2 Scalability of Angelix • Angelix forest is automatically extracted via symbolic execution • As compared to the repair constraints used in the previous work [24, 26], the angelic forest is simpler, and its size is independent of the size of the program under repair • Angelix forest contains enough semantic information to enable multi-location bug fix • SPR does not support multi-line fixes • Gen. Prog can change multiple location of the program 7

1. 2 Scalability of Angelix • The number of repairs generated by Anglix (28) is large than in Gen. Prog (11), and also generally comparable to SPR (31) • In one subject libtiff’, Angelix generated more repair than SPR, and in another subject ‘php’, SPR generated more repairs. • In the ‘libtiff’, the percentage of functionality deleting repairs in the SPR tool [23] goes up to an alarming 80% • Angelix produced functionality-deleting repair significantly less frequently when the same tests were used (21%) 8

1. 2 Scalability of Angelix (1) The requirements of scalability: repairing large programs) (2) Repairability: repairing a large number of defects (3) Patch Quality: changing the functionality of the program in a way developers would agree with, instead of simply deleting functionality 9

2 Motivating Example 10

2 Motivating Example • The call of xzalloc (line 4), which allocates a block of memory, causes a segmentation fault (red box) • A fix involves adding an if contional before the problematic call to xzalloc (line 5) (blue box) • The fix requires removing an existing if statement (lines 1 -2) • This example demonstrates the complexity of multiline repairs fixing multiple buggy location 11

2 Motivating Example • The key difficulty is that a change made in one location ca also change the remaining program execution that should proceed to be repaired • The state-of-the-art search-based repair algorithm such as SPR [23] is currently restricted to fixing a single location • Direct. Fix already supports multiple-location fix • Sem. Fix is more scalable and applies one line fixes, while Direct. Fix is less scalable but can produce multi-line fixes 12

2 Motivating Example Is equivalent 13

2 Motivating Example - Transformation 14

2 Motivating Example – Symbolic Variables • Replacement • Conditional expressions • Right-hand side of assignments • Function parameters • Their repair algorithm proceeds to run symbolic execution over the program with provided tests to collect the semantic 15 information

2. 1 Concise Semantic Signature for Repair • Angelix repair algorithm detects such test passing paths via controlled symbolic execution • We need to know value(angelic value) and program state (angelic state) 16

2. 2 Reasons for Scalability (1) • Lightweight semantic signature • Their lightweight semantic signature reduces the burden of the repair synthesizer • Direct. Fix maintains every expression appearing in the program, therefore the semantic becomes more lengthy and complex 17

2. 2 Reasons for Scalability (2) • Controlled symbolic execution • About a few selected suspicious expressions, instead of usual symbolic input • Explore only a restricted number of feasible execution paths • They initially perform symbolic execution only with a subset of the provided test-suite – Only when some of remaining tests fail with the synthesized repair, they perform additional symbolic execution with these failing tests 18

2. 2 Reasons for Scalability (3) • Angelix forest • Angelix algorithm initiates repair synthesis only when there exists an angelic forest • Their repair algorithm does not waste the resources to synthesize a repair if there is no angelic forest 19

3 Background 20

3 Background 21

4 Methodology (1) Program transformation (2) Fault localization (3) Extracting constraint (4) Patch synthesis 22

4. 0 Program Transformation and Fault Localization • Semantic-preserving program transformation • E. g. , “if(1)” can be added before each unguarded statement • Statistical fault localization • Use Jaccard formula [7] 23

4. 0 Definitions for Angelic Forest 24

4. 1 Angelic Forest Extraction • Extract angelic forest via controlled custom symbolic execution chosen based on a statistical fault localization result 25

4. 1 Angelic Forest Extraction • Symbols are installed during symbolic execution • By replacing the value of each instance of a suspicious expression with a fresh symbol (line 7) 26

4. 2 Patch Synthesis • Once an angelic forest is obtained, they feed it to repair synthesizer as a synthesis specification • Synthesized repair follows one of the angelic paths 27

4. 2 Patch Synthesis • The goal of CBRS: to search for connections between components that (1) satisfy the given specification and (2) minimally differ from the connections of the buggy program • The specification of CBRS is provided in the form of an angelic forest extracted by Algorithm 1 28

4. 2 Patch Synthesis 29

4. 3 Optimization • They start from a small subset of the test suite that provides the highest coverage of the suspicious locations • If the generated patch causes a regression in the whole test suite, we add the counter-example test to the test suite • Their semantics-based method finds a repair in one or a small number of trials, and the cost for rebuilding and retesting is significantly smaller 30

4. 4 Soundness and Completeness • While the size of an angelic forest independent of the size of the program, it also under-approximates the fix space • Their repair method based on an angelic forest is sound in the sense that the repair obtained by their repair method indeed passes all the provided test • Their repair method is incomplete in the sense that it may not produce some repairs, due to the under-approximation of angelic values used in an angelic forest, that can otherwise by synthesized. 31

5 Experimental Results • RQ 1. Can our repair method generate repairs from largescale real-world software? • RQ 2. Can our repair method fix multi-location bugs? 32

5. 1 Experimental Subjects • Gen. Prog ICSE 2012 benchmark • These subjects have been also used in the literature to evaluate other repair tools such as Gen. Prog [14] and SPR [23] • They omit three subjects of the benchmark (python, lighttpd, and fbc) because they could not run these subjects on KLEE [4] 33

5. 2 Experimental Configurations • For the maximum number of suspicious lcations: 1 -10 • Three kinds of suspicious expressions • Conditional expressions • Right-hand side expressions of assignment • Function parameters 34

5. 3 Defect Class • The defect class of their repair algorithm can easily be defined in terms of the fix that can be synthesized • Angelix can synthesize side-effect / functional-call free expression that can be composed of Boolean/arithmetic/relational operators • The “W/I Our Defect Class” column of Table 2 shows the number of defects of each subject that are in our defect class 35

5. 4 Results - Repairability • Angelix show higher repairability in libtiff (10 vs 5), and lower repairability in php (10 vs 18) 36

5. 4 Results Quality 37