Unsupervised Learning Deep Autoencoder Unsupervised Learning We expect

Unsupervised Learning “We expect unsupervised learning to become far more important in the longer

Linear - PCA Reduce to 1 -D: Large variance Small variance

PCA • These projection components are eigen vectors of the covariance matrix • They

Auto-encoder Usually <784 NN Encoder 28 X 28 = 784 code Compact representation of

Recap: PCA Minimize As close as possible encode Input layer decode hidden layer (linear)

Symmetric is not necessary. Deep Auto-encoder • Of course, the auto-encoder can be deep

Deep Auto-encoder vs PCA (30 Dims) 784 30 784 Original Image PCA Deep Auto-encoder

Deep Auto-encoder vs PCA (2 Dims) 784 2 784 1000 500 250 2 250

Deep Auto-encoder - Example NN Encoder PCA 降到 32 -dim Pixel -> t. SNE

Auto-encoder – Text Retrieval Vector Space Model query this is word string: “This is

Auto-encoder – Text Retrieval The documents talking about the same thing will have close

Auto-encoder – Similar Image Search Retrieved using Euclidean distance in pixel intensity space (Images

Auto-encoder – Similar Image Search code (crawl millions of images from the Internet) 256

Retrieved using Euclidean distance in pixel intensity space retrieved using 256 codes

More: Contractive auto-encoder Auto-encoder • De-noising auto-encoder Ref: Rifai, Salah, et al. "Contractive auto

Autoencoder for CNN As close as possible Deconvolution Unpooling Convolution Deconvolution Pooling Unpooling Convolution

CNN -Unpooling 14 x 14 Alternative: simply repeat the values 28 x 28 Source

Actually, deconvolution is convolution. CNN - Deconvolution + + =

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 Target 500 1000

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 Target

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 1000 W 3’

Auto-encoder – Pre-training DNN Find-tune by • Greedy Layer-wise Pre-training again backpropagation output 10

Learning More - Restricted Boltzmann Machine • Neural networks [5. 1] : Restricted Boltzmann

Learning More - Deep Belief Network • Neural networks [7. 7] : Deep learning

Next …. . code NN Decoder • Can we use decoder to generate something?

Not that simple • Interpolation in the latent space != Interpolation in the image

Pokémon • http: //140. 112. 21. 35: 2880/~tlkagk/pokemon/pca. html • http: //140. 112. 21.

Add: Ladder Network • http: //rinuboney. github. io/2016/01/19/laddernetwork. html • https: //mycourses. aalto. fi/pluginfile.

Slides: 33

Download presentation

Unsupervised Learning: Deep Auto-encoder

Unsupervised Learning “We expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object. ” – Le. Cun, Bengio, Hinton, Nature 2015 As I've said in previous statements: most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake. - Yann Le. Cun, March 14, 2016 (Facebook)

Linear - PCA Reduce to 1 -D: Large variance Small variance

PCA Reduce to 1 -D: Orthogonal matrix

PCA • These projection components are eigen vectors of the covariance matrix • They are orthogonal and ranked by associated eigen values (spread) • Detail can be found in 165 B lecture videos on unsupervised learning

Auto-encoder Usually <784 NN Encoder 28 X 28 = 784 code Compact representation of the input object Learn together NN Decoder Can reconstruct the original object

Recap: PCA Minimize As close as possible encode Input layer decode hidden layer (linear) Bottleneck later output layer Output of the hidden layer is the code

Symmetric is not necessary. Deep Auto-encoder • Of course, the auto-encoder can be deep As close as possible Output Layer … Layer Code Layer bottle Layer Input Layer … Initialize by RBM layer-by-layer Reference: Hinton, Geoffrey E. , and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks. " Science 313. 5786 (2006): 504 -507

Deep Auto-encoder vs PCA (30 Dims) 784 30 784 Original Image PCA Deep Auto-encoder 784 1000 500 250 30 250 500 1000 784

Deep Auto-encoder vs PCA (2 Dims) 784 2 784 1000 500 250 2 250 500 1000 784

Deep Auto-encoder - Example NN Encoder PCA 降到 32 -dim Pixel -> t. SNE

Auto-encoder – Text Retrieval Vector Space Model query this is word string: “This is an apple” a an 1 1 0 1 apple pen 1 0 … document Bag-of-word Semantics are not considered.

Auto-encoder – Text Retrieval The documents talking about the same thing will have close code. 2 query 125 250 500 2000 Bag-of-word (document or query) LSA: project documents to 2 latent topics

Auto-encoder – Similar Image Search Retrieved using Euclidean distance in pixel intensity space (Images from Hinton’s slides on Coursera) Reference: Krizhevsky, Alex, and Geoffrey E. Hinton. "Using very deep autoencoders for content-based image retrieval. " ESANN. 2011.

Auto-encoder – Similar Image Search code (crawl millions of images from the Internet) 256 512 1024 2048 4096 8192 32 x 32

Retrieved using Euclidean distance in pixel intensity space retrieved using 256 codes

More: Contractive auto-encoder Auto-encoder • De-noising auto-encoder Ref: Rifai, Salah, et al. "Contractive auto -encoders: Explicit invariance during feature extraction. “ Proceedings of the 28 th International Conference on Machine Learning (ICML-11). 2011. As close as possible encode decode Add noise Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders. " ICML, 2008.

Autoencoder for CNN As close as possible Deconvolution Unpooling Convolution Deconvolution Pooling Unpooling Convolution Deconvolution code Pooling

CNN -Unpooling 14 x 14 Alternative: simply repeat the values 28 x 28 Source of image : https: //leonardoaraujosantos. gitbooks. io/artificialinteligence/content/image_segmentation. html

Actually, deconvolution is convolution. CNN - Deconvolution + + =

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 Target 500 1000 784 W 1’ 1000 W 1 Input 784

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 Target W 2’ 1000 W 2 1000 fix Input 784 Input W 1 784

Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 1000 W 3’ 500 Target W 3 1000 fix 1000 W 2 1000 fix Input 784 Input W 1 784

Auto-encoder – Pre-training DNN Find-tune by • Greedy Layer-wise Pre-training again backpropagation output 10 W 4 500 Target W 3 1000 W 2 1000 W 1 Input 784 Random init

Learning More - Restricted Boltzmann Machine • Neural networks [5. 1] : Restricted Boltzmann machine – definition • https: //www. youtube. com/watch? v=p 4 Vh_z. Mw. HQ&index=36&list=PL 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. N m. UBH • Neural networks [5. 2] : Restricted Boltzmann machine – inference • https: //www. youtube. com/watch? v=lek. Ch_i 32 i. E&list=P L 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. Nm. UBH&index=37 • Neural networks [5. 3] : Restricted Boltzmann machine - free energy • https: //www. youtube. com/watch? v=e 0 Ts_7 Y 6 h. ZU&list =PL 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. Nm. UBH&index=38

Learning More - Deep Belief Network • Neural networks [7. 7] : Deep learning - deep belief network • https: //www. youtube. com/watch? v=vkb 6 AWYXZ 5 I&list =PL 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. Nm. UBH&index=57 • Neural networks [7. 8] : Deep learning - variational bound • https: //www. youtube. com/watch? v=p. St. Dsc. Jh 2 Wo&list =PL 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. Nm. UBH&index=58 • Neural networks [7. 9] : Deep learning - DBN pre-training • https: //www. youtube. com/watch? v=35 MUl. YCColk&list =PL 6 Xpj 9 I 5 q. XYEc. Ohn 7 Tqgh. AJ 6 NAPr. Nm. UBH&index=59

Next …. . code NN Decoder • Can we use decoder to generate something?

Not that simple • Interpolation in the latent space != Interpolation in the image space • Generation != interpolation • More sophisticated generation scheme using Pixel. RNN, variational autoencoder, GAN, etc. • Next lectures and lectures on GAN

Appendix

Pokémon • http: //140. 112. 21. 35: 2880/~tlkagk/pokemon/pca. html • http: //140. 112. 21. 35: 2880/~tlkagk/pokemon/auto. html • The code is modified from • http: //jkunst. com/r/pokemon-visualize-em-all/

Add: Ladder Network • http: //rinuboney. github. io/2016/01/19/laddernetwork. html • https: //mycourses. aalto. fi/pluginfile. php/146701/ mod_resource/content/1/08%20 semisup%20 ladde r. pdf • https: //arxiv. org/abs/1507. 02672