Generative Adversarial Network Hungyi Lee Three Categories of

  • Slides: 39
Download presentation
Generative Adversarial Network 李宏毅 Hung-yi Lee

Generative Adversarial Network 李宏毅 Hung-yi Lee

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN blue eyes, red hair, short hair paired data “Girl with red hair” Generator text image 3. Unsupervised Conditional GAN domain x domain y x y Generator Photo unpaired data Vincent van Gogh’s style

Generative Adversarial Network (GAN) • Anime face generation as example vector image Generator image

Generative Adversarial Network (GAN) • Anime face generation as example vector image Generator image Discriminator score high dimensional vector Larger score means real, smaller score means fake.

Algorithm • Initialize generator and discriminator • In each training iteration: G D Step

Algorithm • Initialize generator and discriminator • In each training iteration: G D Step 1: Fix generator G, and update discriminator D sample Database generated objects 1 1 1 0 0 vector randomly sampled 1 G Update D Fix Discriminator learns to assign high scores to real objects and low scores to generated objects.

Algorithm • Initialize generator and discriminator • In each training iteration: G D Step

Algorithm • Initialize generator and discriminator • In each training iteration: G D Step 2: Fix discriminator D, and update generator G Generator learns to “fool” the discriminator hidden layer vector NN Generator update large network Backpropagation Discriminator fix 0. 13

Algorithm • Initialize generator and discriminator • In each training iteration: Learning D Sample

Algorithm • Initialize generator and discriminator • In each training iteration: Learning D Sample some real objects: Generate some fake objects: 1 1 1 0 0 vector vector Learning G 1 G update G D G Update D fix image D fix 1

https: //crypko. ai/#/

https: //crypko. ai/#/

GAN is hard to train …… • There is a saying …… (I found

GAN is hard to train …… • There is a saying …… (I found this joke from 陳柏文’s facebook. )

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN blue eyes, red hair, short hair paired data “Girl with red hair” Generator text image 3. Unsupervised Conditional GAN domain x domain y x y Generator Photo unpaired data Vincent van Gogh’s style

a dog is running Text-to-Image a bird is flying • Traditional supervised approach c

a dog is running Text-to-Image a bird is flying • Traditional supervised approach c 1: a dog is running NN Image as close as possible Text: “train” Target of NN output A blurry image!

[Scott Reed, et al, ICML, 2016] Conditional GAN c: train G Normal distribution Image

[Scott Reed, et al, ICML, 2016] Conditional GAN c: train G Normal distribution Image x = G(c, z) x is real image or not D (original) Real images: Generated images: scalar 1 0 Generator will learn to generate realistic images …. But completely ignore the input conditions.

[Scott Reed, et al, ICML, 2016] Conditional GAN c: train G Image x =

[Scott Reed, et al, ICML, 2016] Conditional GAN c: train G Image x = G(c, z) scalar x is realistic or not + c and x are matched or not Normal distribution D (better) True text-image pairs: (train , (cat , ) 0 (train , ) 1 ) 0

Conditional GAN - Sound-to-image c: sound "a dog barking sound" Training Data Collection video

Conditional GAN - Sound-to-image c: sound "a dog barking sound" Training Data Collection video G Image

Conditional GAN - Sound-to-image • Audio-to-image The images are generated by Chia-Hung Wan and

Conditional GAN - Sound-to-image • Audio-to-image The images are generated by Chia-Hung Wan and Shun-Po Chuang. https: //wjohn 1483. github. io/audi o_to_scene/index. html Louder

Conditional GAN - Image-to-label Multi-label Image Classifier = Conditional Generator Input condition Generated output

Conditional GAN - Image-to-label Multi-label Image Classifier = Conditional Generator Input condition Generated output

Conditional GAN - Image-to-label The classifiers can have different architectures. The classifiers are trained

Conditional GAN - Image-to-label The classifiers can have different architectures. The classifiers are trained as conditional GAN. [Tsai, et al. , submitted to ICASSP 2019] F 1 VGG-16 + GAN Inception +GAN Resnet-101 +GAN Resnet-152 +GAN Att-RNN RLSD MS-COCO 56. 0 60. 4 62. 4 NUS-WIDE 33. 9 41. 2 53. 5 63. 8 62. 8 64. 0 63. 3 63. 9 62. 1 62. 0 55. 8 53. 1 55. 4 52. 1 54. 7 46. 9

Conditional GAN - Image-to-label The classifiers can have different architectures. The classifiers are trained

Conditional GAN - Image-to-label The classifiers can have different architectures. The classifiers are trained as conditional GAN. Conditional GAN outperforms other models designed for multi-label. F 1 VGG-16 + GAN Inception +GAN Resnet-101 +GAN Resnet-152 +GAN Att-RNN RLSD MS-COCO 56. 0 60. 4 62. 4 NUS-WIDE 33. 9 41. 2 53. 5 63. 8 62. 8 64. 0 63. 3 63. 9 62. 1 62. 0 55. 8 53. 1 55. 4 52. 1 54. 7 46. 9

Talking Head https: //arxiv. org/abs/1905. 08233

Talking Head https: //arxiv. org/abs/1905. 08233

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN blue eyes, red hair, short hair paired data “Girl with red hair” Generator text image 3. Unsupervised Conditional GAN domain x domain y x y Generator Photo unpaired data Vincent van Gogh’s style

Domain X Cycle GAN Domain X Domain Y Become similar to domain Y ?

Domain X Cycle GAN Domain X Domain Y Become similar to domain Y ? scalar Input image belongs to domain Y or not Domain Y

Domain X Cycle GAN Domain X Domain Y Become similar to domain Y Not

Domain X Cycle GAN Domain X Domain Y Become similar to domain Y Not what we want! ignore input scalar Input image belongs to domain Y or not Domain Y

[Jun-Yan Zhu, et al. , ICCV, 2017] Cycle GAN as close as possible Cycle

[Jun-Yan Zhu, et al. , ICCV, 2017] Cycle GAN as close as possible Cycle consistency Lack of information for reconstruction scalar Input image belongs to domain Y or not Domain Y

Cycle GAN as close as possible scalar: belongs to domain Y or not scalar:

Cycle GAN as close as possible scalar: belongs to domain Y or not scalar: belongs to domain X or not as close as possible

Cycle GAN as close as possible It is bad. negative It is good. positive

Cycle GAN as close as possible It is bad. negative It is good. positive negative positive sentence? negative sentence? I love you. It is bad. I hate you. negative as close as possible I love you. positive

Discrete Issue Seq 2 seq model hidden layer with discrete output It is good.

Discrete Issue Seq 2 seq model hidden layer with discrete output It is good. It is bad. negative positive update positive sentence? large network fix Backpropagation

Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner, et al, ar. Xiv, 2016]

Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner, et al, ar. Xiv, 2016] Continuous Input for Discriminator • [Sai Rajeswar, et al. , ar. Xiv, 2017][Ofir Press, et al. , ICML workshop, 2017][Zhen Xu, et al. , EMNLP, 2017][Alex Lamb, et al. , NIPS, 2016][Yizhe Zhang, et al. , ICML, 2017] “Reinforcement Learning” • [Yu, et al. , AAAI, 2017][Li, et al. , EMNLP, 2017][Tong Che, et al, ar. Xiv, 2017][Jiaxian Guo, et al. , AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al. , ICLR, 2018]

文句改寫 感謝 王耀賢 同學提供實驗結果 ✘ Negative sentence to positive sentence: ✘ it's a crappy

文句改寫 感謝 王耀賢 同學提供實驗結果 ✘ Negative sentence to positive sentence: ✘ it's a crappy day -> it's a great day i wish you could be here -> you could be here it's not a good idea -> it's good idea i miss you -> i love you i don't love you -> i love you i can't do that -> i can do that i feel so sad -> i happy it's a bad day -> it's a good day it's a dummy day -> it's a great day sorry for doing such a horrible thing -> thanks for doing a great thing my doggy is sick -> my doggy is my doggy my little doggy is sick -> my little doggy is my little doggy

Speech Recognition Human Teacher Supervised Learning I can do speech recognition after teaching This

Speech Recognition Human Teacher Supervised Learning I can do speech recognition after teaching This utterance is “good morning”. • Supervised learning needs lots of annotated speech. • However, most of the languages are low resourced.

Speech Recognition Human Teacher Supervised Learning I can do speech recognition after teaching Unsupervised

Speech Recognition Human Teacher Supervised Learning I can do speech recognition after teaching Unsupervised Learning Listening to human talking This utterance is “good morning”. I can automatically learn speech recognition Reading text on the Internet

Acoustic Token Discovery Acoustic tokens can be discovered from audio collection without text annotation.

Acoustic Token Discovery Acoustic tokens can be discovered from audio collection without text annotation. Acoustic tokens: chunks of acoustically similar audio segments with token IDs [Zhang & Glass, ASRU 09] [Huijbregts, ICASSP 11] [Chan & Lee, Interspeech 11]

Acoustic Token Discovery Token 3 Token 2 Token 1 Token 4 Token 3 Acoustic

Acoustic Token Discovery Token 3 Token 2 Token 1 Token 4 Token 3 Acoustic tokens can be discovered from audio collection without text annotation. Acoustic tokens: chunks of acoustically similar audio segments with token IDs [Zhang & Glass, ASRU 09] [Huijbregts, ICASSP 11] [Chan & Lee, Interspeech 11]

[Wang, et al. , ICASSP, 2018] Acoustic Token Discovery Phonetic-level acoustic tokens are obtained

[Wang, et al. , ICASSP, 2018] Acoustic Token Discovery Phonetic-level acoustic tokens are obtained by segmental sequence-to-sequence autoencoder.

Unsupervised Speech Recognition p 1 p 2 p 3 p 4 AY L AH

Unsupervised Speech Recognition p 1 p 2 p 3 p 4 AY L AH V Y UW G UH D B AY p 1 p 3 p 2 HH AW AA R Y UW p 1 p 4 p 3 p 5 p 1 p 5 p 4 p 3 Phone-level Acoustic Pattern Discovery p 1 = “AY” Cycle GAN AY M F AY N T AY W AA N Phoneme sequences from Text [Liu, et al. , INTERSPEECH, 2018] [Chen, et al. , ar. Xiv, 2018]

Model

Model

Experimental Results

Experimental Results

Accuracy The progress of supervised learning Unsupervised learning today (2019) is as good as

Accuracy The progress of supervised learning Unsupervised learning today (2019) is as good as supervised learning 30 years ago. The image is modified from: Phone recognition on the TIMIT database Lopes, C. and Perdigão, F. , 2011. Speech Technologies, Vol 1, pp. 285 --302.

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN

Three Categories of GAN 1. Typical GAN Generator random vector image 2. Conditional GAN blue eyes, red hair, short hair paired data “Girl with red hair” Generator text image 3. Unsupervised Conditional GAN domain x domain y x y Generator Photo unpaired data Vincent van Gogh’s style

To Learn More … You can learn more from the You. Tube Channel

To Learn More … You can learn more from the You. Tube Channel