Image recognition Defense adversarial attacks using Generative Adversarial

Outline § Adversarial attacks § Generative Adversarial Network (GAN) § How to defense attacks

Adversarial Attacks Adversarial examples for a classifier C(): § A pair of input x

Adversarial Attacks Why does classifier become fool for these examples? 7/25

Adversarial Attacks Why does classifier become fool for these examples? An intuition from the

Generative adversarial network (GAN) § x and x’ have similar PDF § G() has

Generative adversarial network (GAN) § Convergence state: pdata(x)=p. G(x) § Green solid line: probability

Generative adversarial network (GAN) 11/25

How to defense attacks using GAN § G() is pre-trained and has learned the

Numerical results First-order classifier attacks for handwritten digit classification 14/25

Numerical results First-order classifier attacks for handwritten digit classification 15/25

Numerical results First-order classifier attacks for handwritten digit classification 16/25

Numerical results First-order classifier attacks for gender classification 17/25

Numerical results First-order classifier attacks for gender classification 18/25

Numerical results Substitute model attacks Results from Invert and Classify 19/25

Numerical results Comparison between Invert and Classify and Enhanced Invert and Classify 20/25

Thinking § GAN for regression problems? § GAN versus other neural networks? § One

Slides: 25

Download presentation

Image recognition: Defense adversarial attacks using Generative Adversarial Network (GAN) Speaker: Guofei Pang Division of Applied Mathematics Brown University Presentation after reading the paper: Ilyas, Andrew, et al. "The Robust Manifold Defense: Adversarial Training using Generative Models. " ar. Xiv preprint ar. Xiv: 1712. 09196 (2017).

Outline § Adversarial attacks § Generative Adversarial Network (GAN) § How to defense attacks using GAN § Numerical results 2/25

Adversarial Attacks 3/25

Adversarial Attacks 4/25

Adversarial Attacks n = m n*m 5/25

Adversarial Attacks Adversarial examples for a classifier C(): § A pair of input x 1 and x 2 § A person says they are of the same class § But a classifier will they are completely different! 6/25

Adversarial Attacks Why does classifier become fool for these examples? 7/25

Adversarial Attacks Why does classifier become fool for these examples? An intuition from the authors: § Natural image: Low-dimensional manifold § Noisy image: High-dimensional manifold § High dimensionality is tough for classifier. 8/25

Generative adversarial network (GAN) § x and x’ have similar PDF § G() has learned the underlying distribution of image dataset after training GAN § The DNN G() is a nonlinear mapping from lowdimensional space, z, to high-dimensional space, x’ Original image x GAN Synthetic image /Generative model x’=G(z) Generator G(z) Noisy input z, say, z – N(0, I) 9/25

Generative adversarial network (GAN) § Convergence state: pdata(x)=p. G(x) § Green solid line: probability density function (PDF) of the generator G() § Black dotted line: PDF of original image x, i. e. , pdata(x) § Blue dash line: PDF of discriminator D() 10/25

Generative adversarial network (GAN) 11/25

How to defense attacks using GAN § G() is pre-trained and has learned the underlying distribution of the training (image) dataset after training GAN Synthetic image x’=G(z*) (Preserve low-dimensional manifold) Invert and Classify Classifier C() Original image x (Could include highdimensional manifold when noise enters) 12/25

How to defense attacks using GAN § G() is pre-trained and has learned the underlying distribution of the training (image) dataset after training GAN Enhanced Invert and Classify Synthetic image x’=G(z*) (Preserve low-dimensional manifold) Upper bound of attack magnitude Classifier C() (retrain the classifier) Classification loss 13/25

Numerical results First-order classifier attacks for handwritten digit classification 14/25

Numerical results First-order classifier attacks for handwritten digit classification 15/25

Numerical results First-order classifier attacks for handwritten digit classification 16/25

Numerical results First-order classifier attacks for gender classification 17/25

Numerical results First-order classifier attacks for gender classification 18/25

Numerical results Substitute model attacks Results from Invert and Classify 19/25

Numerical results Comparison between Invert and Classify and Enhanced Invert and Classify 20/25

Numerical results 21/25

Numerical results 22/25

Numerical results 23/25

Numerical results 24/25

Thinking § GAN for regression problems? § GAN versus other neural networks? § One defense strategy for all types of attacks? 25/25