Automatic Inference of Code Transforms for Patch Generation

  • Slides: 48
Download presentation
Automatic Inference of Code Transforms for Patch Generation Fan Long, Peter Amidon and Martin

Automatic Inference of Code Transforms for Patch Generation Fan Long, Peter Amidon and Martin Rinard ACM ESEC/FSE 2017 Software Engineering Laboratory Dept. of Computer Science G 201792004 Youngjun Jeong

Contents • 1 Introduction • 2 Transform Inference • 3 Inferred Transforms • 4

Contents • 1 Introduction • 2 Transform Inference • 3 Inferred Transforms • 4 Inference System • 5 Implementation • 6 Experimental Results • 7 Conclusion 2

1 Introduction • Automatic patch generation [30, 33 -35, 37, 38, 40, 48, 56,

1 Introduction • Automatic patch generation [30, 33 -35, 37, 38, 40, 48, 56, 61, 62] hold out the promise of significantly reducing the human effort required to diagnose, debug, and fix software defects • The standard generate and validate approach starts with a set of test cases, at least one of which exposes the defect • All previous generate and validate systems work with a set of manually crafted transforms [33 -35, 37, 38, 48, 56, 61, 62] to patch bugs that fall within the scope of transforms 3

1. 1 Genesis • A novel systems that infers code transforms for automatic patch

1. 1 Genesis • A novel systems that infers code transforms for automatic patch generation systems [36] • Genesis generalizes subsets of patches to infer transforms that together generate a productive search space of candidate patches • To the best of their knowledge, Genesis is the first system to automatically infer patch generation transforms or candidate patch search spaces from successful patches 4

1. 1 Genesis • Transforms: Each Genesis has two template AST – original/replacement code

1. 1 Genesis • Transforms: Each Genesis has two template AST – original/replacement code • Generators: introduce new code and logic, and is essential to enabling Genesis to generate correct patches for previously unseen applications • Search Space Inference with ILP(Integer Linear Program) • A key challenge in patch search space design is navigating an inherent tradeoff between coverage and tractability [39] 5

1. 2 Experimental Results • Working with a training set that includes 483 NP(Null

1. 2 Experimental Results • Working with a training set that includes 483 NP(Null Pointer) patches, 199 OOB(Out of Bounds) patches, and 287 CC(Class Cast) patches drawn from 356 open source applications • Genesis infers a search space generated by 108 transforms • They compare Genesis with PAR [33, 44], a previous patch generation system for Java that works with manually defined patch templates 6

1. 3 Contributions • Transforms with Template ASTs and Generators • These transforms enable

1. 3 Contributions • Transforms with Template ASTs and Generators • These transforms enable Genesis to abstract away patchand application-specific details to capture common patch patterns and strategies • Generators enable Genesis to synthesize the new code and logic • Patch Generalization • Genesis automatically derives a transform that captures the common patch generation pattern present in the 7 patches

1. 3 Contributions • Search Space Inference • Starts with a set of training

1. 3 Contributions • Search Space Inference • Starts with a set of training patches • For tradeoff between coverage and tractability • Complete System and Experimental Results 8

2 Transform Inference (1) • Patch Sampling and Generalization • Genesis inference algorithm works

2 Transform Inference (1) • Patch Sampling and Generalization • Genesis inference algorithm works with sampled subsets of patches from the training set • Applies a generalization algorithm to infer a transform that it can apply to generate candidate patches 9

2 Transform Inference (2) - Example • Contents 10

2 Transform Inference (2) - Example • Contents 10

2 Transform Inference (2) 11

2 Transform Inference (2) 11

2 Transform Inference (3) 12

2 Transform Inference (3) 12

2 Transform Inference (4) • Candidate Transforms • Genesis repeatedly samples training patches to

2 Transform Inference (4) • Candidate Transforms • Genesis repeatedly samples training patches to obtain the candidate transforms 13

2 Transform Inference (5) 14

2 Transform Inference (5) 14

2 Transform Inference (6) 15

2 Transform Inference (6) 15

3 Inferred Transforms (1) • Transforms That Target Boolean Expressions • Conjoin or disjoin

3 Inferred Transforms (1) • Transforms That Target Boolean Expressions • Conjoin or disjoin a generated subexpression to a boolean condition in the original program • Conditional Execution • Conditionally execute existing matched code 16

3 Inferred Transforms (2) • Inserted If Then Else • Wrap existing code in

3 Inferred Transforms (2) • Inserted If Then Else • Wrap existing code in an ‘if-then-else’ statement • Inserted If Then 17

3 Inferred Transforms (3) • Replace Code • Replace existing code with newly generated

3 Inferred Transforms (3) • Replace Code • Replace existing code with newly generated code • The transforms differ in the form of the code they replace and generate and the generator constraints 18

3 Inferred Transforms (4) • Try/Catch/Continue • Wraps existing code in a try construct

3 Inferred Transforms (4) • Try/Catch/Continue • Wraps existing code in a try construct with an empty catch block • For Loop Off By One • Corrects off by one errors in for loops, specifically by enumerating combinations of starting values and loop termination conditions 19

3 Inferred Transforms (5) • Change Declared Type • Changes the declared type of

3 Inferred Transforms (5) • Change Declared Type • Changes the declared type of a variable declaration (including initializer) • Other Transforms • E. g. null check insertion 20

3 Inferred Transforms - Discussion • Genesis transforms are more numerous, more diverse, and

3 Inferred Transforms - Discussion • Genesis transforms are more numerous, more diverse, and target a wider range of defects more precisely and tractability • Some transforms target specific defect classes such as off by one defects in for loops • Other transforms apply general templates with the generator constraints controlling the enumeration to deliver a tractable search space 21

4 Inference System 22

4 Inference System 22

4. 1 Definition 4. 1 - CFG 23

4. 1 Definition 4. 1 - CFG 23

4. 1 Definition 4. 2 - AST 24

4. 1 Definition 4. 2 - AST 24

4. 1 Definition 4. 3, 4. 4, 4. 5 – about AST 25

4. 1 Definition 4. 3, 4. 4, 4. 5 – about AST 25

4. 1 Notation and Utility Functions 26

4. 1 Notation and Utility Functions 26

4. 2. 1 Template AST Forest 27

4. 2. 1 Template AST Forest 27

4. 2. 1 Template AST Forest 28

4. 2. 1 Template AST Forest 28

4. 2. 1 Template AST Forest 29

4. 2. 1 Template AST Forest 29

4. 2. 1 Template AST Forest • The first rule corresponds to the simple

4. 2. 1 Template AST Forest • The first rule corresponds to the simple case of a single terminal node • The second and the third rules correspond to the cases of a single non-terminal node or a list of nodes 30

4. 2. 1 Template AST Forest • The fourth and fifth rules correspond to

4. 2. 1 Template AST Forest • The fourth and fifth rules correspond to the case of a single template variable node in the template AST forest • The fourth rule matches the template variable against a forest • The fifth rule matches the variable against a tree 31

4. 2. 2 Generators 32

4. 2. 2 Generators 32

4. 2. 2 Generators 33

4. 2. 2 Generators 33

4. 2. 2 Generators 34

4. 2. 2 Generators 34

4. 2. 3 Transforms 35

4. 2. 3 Transforms 35

4. 2. 3 Transforms 36

4. 2. 3 Transforms 36

4. 3 Transform Generalization • There are many possible transforms • How to select

4. 3 Transform Generalization • There are many possible transforms • How to select useful transforms? • Evaluation coverage and tractability • Coverage: Does the transform generates the correct patch? • Tractability: How many candidate patches the transform generates in total? 37

4. 3 Transform Generalization 38

4. 3 Transform Generalization 38

4. 3 Transform Generalization 39

4. 3 Transform Generalization 39

4. 4 Sampling Algorithm 40

4. 4 Sampling Algorithm 40

4. 4 Sampling Algorithm 41

4. 4 Sampling Algorithm 41

4. 5 Search Space Inference Algorithm 42

4. 5 Search Space Inference Algorithm 42

5 Implementation • They use the spoon library [46] to parse Java programs •

5 Implementation • They use the spoon library [46] to parse Java programs • Their current implementation supports any Java application that operates with the maven project management system [5] and Junit [19] testing framework • Genesis applies its inferred transforms to each of the suspicious statements in the ranked defect localization list • For each transform, Genesis computes a cost score which is the average number of candidate patches the transform needs to generate to cover a validation case 43

6 Experimental Results 44

6 Experimental Results 44

6 Experimental Results 45

6 Experimental Results 45

6 Experimental Results 46

6 Experimental Results 46

6 Experimental Results 47

6 Experimental Results 47

7 Conclusion • Previous generate and validate patch generation systems work with a fixed

7 Conclusion • Previous generate and validate patch generation systems work with a fixed set of transforms defined by their human developers • By automatically inferring transforms from successful human patches, Genesis makes it possible to leverage the combined expertise and patch generation strategies of developers worldwide to automatically patch bugs in new applications 48