The Limitations of Deep Learning in Adversarial Settings

Introduction • DNNs achieved high accuracy on image classification tasks – Res. Net achieved

Introduction • An adversarial sample is an input crafted to cause DNNs to misclassify.

Previous Works • “Deep neural networks are easily fooled: High confidence predictions for unrecognizable

What is Deep Learning? Training the networks Training dataset

What is Deep Learning? Inference process Test dataset “ Silky Terrier ”

Training vs Inference • Training – Training data, labels – Feedforward and backpropagation –

Natural Language Processing < Speech Recognition > < Machine Translation >

GAN (Unsupervised Learning) Training Images Generated Images

Soft-max Function • One of activation functions • Fully connected last layer • Sum

Adversarial Goals 1) Confidence reduction 2) Misclassification 3) Targeted misclassification 4) Source/target misclassification

Adversarial Capabilities • Adversary has knowledge of network architecture and its parameter values. •

Threat Model Taxonomy • Only know the network architecture (layers, activations, trained weights, biases,

Approach • Original input Original output Add perturbation Adversarial input target output

How to craft adversarial samples? • After training 100 epochs

How to craft adversarial samples? • “Forward derivative” is defined as the Jacobian matrix

How to craft adversarial samples? • Small input variations can lead to extreme variations

Steps for crafting adversarial samples •

How to compute forward derivative? • We can compute derivatives using gradients as doing

Adversarial Saliency Maps • Saliency maps indicate which input features should the adversary perturb.

Goals of Experiment (1) Can we exploit any sample? (2) How can we identify

Crafting Adversarial Samples • 30, 000 samples from MNIST dataset • 9 adversarial samples

Crafting Adversarial Samples Decreasing pixel intensity Increasing pixel intensity

Results on Larger Datasets • Average success rate = 97. 10% • Average distortion

Class Pair Study 0, 8 : Easy to succeed 0, 8 : small distortion

Class Pair Study 1 : Hard to succeed 1 : huge distortion

Human Perception Study • Ask ‘is that sample a numeric digit’? , ‘What digit

Defenses 1) Adversarial sample detection • The sum of squared difference between each pair

Related Works • NIPS 2017 Defense against adversarial attack • “Practical Black-Box Attacks against

Conclusion • This paper introduces a new algorithm to craft adversarial samples based on

Slides: 42

Download presentation

The Limitations of Deep Learning in Adversarial Settings Nicolas Papernot, Patrick Mc. Daniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, Ananthram Swami Accepted to the 1 st IEEE European Symposium on Security & Privacy, IEEE 2016. Saarbrucken, Germany. Presented by YOUNGMIN CHOI, EE 515 2019 Fall

Introduction

Introduction • DNNs achieved high accuracy on image classification tasks – Res. Net achieved 96. 43% accuracy on Image. Net test set. • New threats appear in DL area – Ex) DL to identify traffic signs Misclassify!! ‘Go’ Sign

Introduction • An adversarial sample is an input crafted to cause DNNs to misclassify. • This paper proposed a new algorithm to craft an adversarial sample against DNNs Misclassification perturb Original MNIST dataset Adversarial Sample

Previous Works • “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images”, Nguyen et al. , CVPR’ 15 – Compute gradients to update input, not weight • “Explaining and harnessing adversarial examples”, Ian J. Goodfellow et al. , ICLR’ 15 – Explain the reason why adversarial attacks work

Background

What is Deep Learning? Training the networks Training dataset

What is Deep Learning? Inference process Test dataset “ Silky Terrier ”

Training vs Inference • Training – Training data, labels – Feedforward and backpropagation – Updating parameters • Inference – Test data – Feedforward – No updates of parameters This paper Attacks on inference time

Computer Vision

Natural Language Processing < Speech Recognition > < Machine Translation >

GAN (Unsupervised Learning) Training Images Generated Images

Soft-max Function • One of activation functions • Fully connected last layer • Sum of outputs = 1 • It can be interpreted as probability

Threat Model

Adversarial Goals 1) Confidence reduction 2) Misclassification 3) Targeted misclassification 4) Source/target misclassification

Adversarial Capabilities • Adversary has knowledge of network architecture and its parameter values. • The layers and activation functions • Weights and biases after training phase • It is possible to simulate the network with this information

Threat Model Taxonomy • Only know the network architecture (layers, activations, trained weights, biases, etc) • Source class → target class Ex) 2 → 8

Approach

Approach • Original input Original output Add perturbation Adversarial input target output

How to craft adversarial samples? •

How to craft adversarial samples? • After training 100 epochs

How to craft adversarial samples? •

How to craft adversarial samples? • “Forward derivative” is defined as the Jacobian matrix of function.

How to craft adversarial samples? •

How to craft adversarial samples? • Small input variations can lead to extreme variations of the output. • Not all regions of input domain are conducive to find adversarial samples. • The forward derivative reduces the adversarial-sample search space.

Steps for crafting adversarial samples •

How to compute forward derivative? • We can compute derivatives using gradients as doing in backpropagation. • Take the derivative of the network, rather than on cost function. • Propagate gradients forward to find input that changes outputs a lot.

Adversarial Saliency Maps • Saliency maps indicate which input features should the adversary perturb. • Increase the probability of target class decrease the probability of other classes

Steps for crafting adversarial samples •

Evaluation

Goals of Experiment (1) Can we exploit any sample? (2) How can we identify samples more vulnerable than others? (3) How do humans perceive adversarial samples compared to DNNs?

Crafting Adversarial Samples • 30, 000 samples from MNIST dataset • 9 adversarial samples were generated for each sample. • Total 270, 000 adversarial samples

Crafting Adversarial Samples Decreasing pixel intensity Increasing pixel intensity

Results on Larger Datasets • Average success rate = 97. 10% • Average distortion of successful samples = 4. 44%

Class Pair Study 0, 8 : Easy to succeed 0, 8 : small distortion

Class Pair Study 1 : Hard to succeed 1 : huge distortion

Human Perception Study • Ask ‘is that sample a numeric digit’? , ‘What digit is it? ’

Defenses 1) Adversarial sample detection • The sum of squared difference between each pair of neighboring pixels is always higher for adversarial samples. 2) Improvements of DNN robustness • Generate adversarial samples using GAN • Train models with adversarial samples Adversarial samples generated using DCGAN

Related Works • NIPS 2017 Defense against adversarial attack • “Practical Black-Box Attacks against Machine Learning” Nicolas paper not et al. , 2017 – Shows the black-box attacks resulting from ‘transferability’ between architectures

Conclusion • This paper introduces a new algorithm to craft adversarial samples based on forward derivatives. • This algorithm can generate adversarial samples correctly classified by human but misclassified in specific targets by a DNN with a 97% success rate while modifying only 4. 02% of input features.

Thank you