Deep Xplore Automated Whitebox Testing of Deep Learning
Deep. Xplore: Automated Whitebox Testing of Deep Learning Systems Kexin Pei Junfeng Yang Yinzhi Cao Suman Jana
Motivation n Deep learning (DL) techniques are now deployed in safety-critical and time-critical domains ¨ Self-driving cars, malware detection n Existing DL testing fails to expose erroneous behaviors for rare inputs ¨ Google self-driving car crashed into a bus that "should" have yielded ¨ Tesla car in autopilot did not recognize a trailer as an obstacle due to its "white color over bright sky" and "high ride height"
Proposed solution (I) n Adversarial testing ¨ Start with an existing image ¨ Add minor changes that would fool the DL models but not the human eye
Proposed solution (II) n Use neuron coverage to measure the parts of a DL system exercised by test inputs ¨ Code coverage does not work n Run multiple DL systems over the same images to detect odd behaviors ¨ Most likely to be incorrect n (Condorcet's jury theorem)
Proposed solution (III) n Best test inputs for DL system ¨ Trigger many differential behaviors and achieve high neuron coverage n Selecting them ¨ Can be represented as a joint optimization problem ¨ Can use gradient-based search techniques
DL Systems n n Include at least one Deep Neural Network (DNN) component DNN components learn their rules directly from data ¨ DNN’s rules are mostly unknown to its developers
DNN architecture n Multiple layers of neurons ¨ Input layer ¨ One or more hidden layers ¨ Output layer
The neuron n Individual computing unit/mathematical function Multiple inputs I 1, I 2, … with distinct weights w 1, w 2, … Output is a function of weighted sum of inputs = σ(Σi wi. Ii) ¨ Often a step function ¨O
How the layers work together n Each layer transforms the information contained in its input into a higher-level representation of the data
Limitations of existing DNN testing (I) n Low test coverage ¨ No attempt to try to cover the rules ¨ Standard procedure is to divide the whole data set randomly into a training part and a testing part ¨ Sometimes include adversarial input n Not enough
Limitations of existing DNN testing (II) n Problems with low-coverage DNN tests ¨ Same as low-coverage tests of conventional software n Software is not tested for rare conditions ¨ Some behaviors of DNN are left unexplored n What if a nose is detected and its dominant color is red?
Deep. Xplore workflow
- Slides: 12