VUzzer Applicationaware Evolutionary Fuzzing Sanjay Rawat Vivek Jain

What we achieved… A really smart fuzzer that understands application to formulate its fuzzing

Introduction • Fuzzing is a simple, yet powerful testing technique. • There have been

Problem with Traditional Fuzzing Blackbox fuzzing: Aiming with luck!

Problem with Traditional Fuzzing smart fuzzing: Aiming with educated guess!

Problem Exemplified…. Where is ‘a’? a==xffd 8 What values? Hard-to-reach-paths (deeper buried bugs) Easy

Issues identified… • For smart code-coverage based fuzzer, it is important to have some

Fuzzing+Symbex • Symbolic/concolic execution can answer such questions. • But. . . Scalability?

Recent Observations on Fuzzing • Lava: Large-scale automated vulnerability addition, ” in Proc. IEEE

Recent Observations on fuzzing+Symbex • Experience Report: How is Dynamic Symbolic Execution Different from

Evolving Our Solution- VUzzer • Lets start with something we know- AFL Q inputs

Evolving Our Solution- VUzzer • Moving to Vuzzer… Q inputs Input preference with path

VUzzer: main insights • Leverage application’s control- and data-flow features to infer input properties:

Control-flow features • Used for paths preference • Basic block weights (static analysis) –

Control-flow features • Error code detection (dynamic analysis) – Often fuzzing results in invalid

Data-flow features • Used for inferring input properties that control the execution. • Dynamic

Evaluation • DARPA CGC binaries • Various applications with binary input format as used

Crash Triage • !exploitable (not very conclusive) • Our heuristics based on library calls

Conclusions • Evolutionary fuzzing in promising. • It is worth spending time in analysis

Slides: 25

Download presentation

VUzzer: Application-aware Evolutionary Fuzzing Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, Herbert Bos

What we achieved… A really smart fuzzer that understands application to formulate its fuzzing strategies by learning: • Important offsets in the inputs • Important values at certain offsets (magic-bytes) • path prioritization A fuzzer that outperforms other fuzzers based on advanced techniques , e. g. , Symbex, by order of magnitude less number of inputs to trigger bugs. A fuzzer that shows consistent performance over various applications (DARPA CGC, LAVA, other applications)

Introduction • Fuzzing is a simple, yet powerful testing technique. • There have been every effective fuzzers, like AFL. • Useful in discovering low-hanging bugs (though!) • Why?

Problem with Traditional Fuzzing Blackbox fuzzing: Aiming with luck!

Problem with Traditional Fuzzing smart fuzzing: Aiming with educated guess!

Problem Exemplified…. Where is ‘a’? a==xffd 8 What values? Hard-to-reach-paths (deeper buried bugs) Easy paths (superficial paths), error code

Issues identified… • For smart code-coverage based fuzzer, it is important to have some knowledge about: – Where (which offsets in input) to apply mutation – What values to replace with. – How to avoid traps (paths leading to error handling code)

Fuzzing+Symbex • Symbolic/concolic execution can answer such questions. • But. . . Scalability?

Recent Observations on Fuzzing • Lava: Large-scale automated vulnerability addition, ” in Proc. IEEE S&P ’ 16. IEEE Press, 2016. – quickly and automatically injecting large numbers of realistic bugs into program source code. – Results are not very encouraging for fuzzing!

Recent Observations on fuzzing+Symbex • Experience Report: How is Dynamic Symbolic Execution Different from Manual Testing? – A Study on KLEE, In: ISSTA'15. – Manually developed test suites perform better than KLEE-based test suites on covering hard-tocover code… – KLEE-based test suites are less effective on exploring some meaningful paths and generating valid string structural inputs to go through the input parser.

Concrete results (From LAVA paper)

Evolving Our Solution- VUzzer • Lets start with something we know- AFL Q inputs Bitflip, replace, ari thmetic No, (perhaps) try more mutation Mutate at offset X New edge? Execute and monitor edges (BB) Yes, add input to Q

Evolving Our Solution- VUzzer • Moving to Vuzzer… Q inputs Input preference with path prioritizationstatic analysis Bitflip, replace ment, ari thmetic Mutate only interesting offsets and with interesting values (magic-bytes) Is it error handing BB? so, more not mutation No, (perhaps)If try interesting. Mutate at offset X New edge? Execute and monitor edges (BB) Yes, add input to Q Also perform taintflow to determine interesting offsets/values (O/V)

Our Solution: VUzzer

VUzzer: main insights • Leverage application’s control- and data-flow features to infer input properties: applications is designed to work with that input! • Prioritize and deprioritize paths: Certain paths are difficult to execute as they are guarded by constraints (nested conditions)! • VUzzer puts emphasis on learning these properties.

Control-flow features • Used for paths preference • Basic block weights (static analysis) – CFG as Markov-chain (enumerating all paths is infeasible!) – Nested blocks are hard to reach -> lower probabilities -> higher weights – These weights are used in fitness function to raise/lower the input score.

Control-flow features • Error code detection (dynamic analysis) – Often fuzzing results in invalid inputs, thereby driving execution towards error handling code. – Deprioritizing such paths improves fuzzing efforts – Vuzzer detects them by comparing execution traces of valid and invalid inputs.

Data-flow features • Used for inferring input properties that control the execution. • Dynamic taintflow analysis – important offsets (cmp, lea) -> mutation to focus upon – Values (branch constraints) (cmp) -> magic-byte detection. • Static analysis – Constant bytes (branch constraint? )

Evaluation • DARPA CGC binaries • Various applications with binary input format as used in other work (VA) • A set of buggy binaries recently proposed in LAVA

Results • DARPA CGC binaries – 29/23

Results • LAVA

Results • VA dataset

Results: Time

Crash Triage • !exploitable (not very conclusive) • Our heuristics based on library calls Manual Analysis: • • tcpdump (Out of bound read, fixed) Mpg 321 (SIGSEGV, not fixed; double free, not fixed) Tcptrace (out of bound read (not fixed) Gif 2 png (out of bound read; not fixed)

Conclusions • Evolutionary fuzzing in promising. • It is worth spending time in analysis for creating new generation of inputs than executing millions of inputs per seconds. • Symbex based analysis is promising but scalability is an issue. • We show that taintflow analysis is a viable option for intelligent fuzzing, along with other light-weight analyses. • We developed a fully functional fuzzer which is able to fuzz a variety of applications. • Vuzzer Site: https: //www. vusec. net/projects/fuzzing/