CS 4803 7643 Deep Learning Topics Unsupervised Learning
- Slides: 55
CS 4803 / 7643: Deep Learning Topics: – Unsupervised Learning – Generative Models (Pixel. RNNs, VAEs) Dhruv Batra Georgia Tech
Administrativia • HW 1 and HW 2 solutions released – https: //gatech. instructure. com/courses/28059/files/ • HW 3 out – Due: 11/06, 11: 55 pm (C) Dhruv Batra 2
Overview ● Unsupervised Learning ● Generative Models ○ Pixel. RNN and Pixel. CNN ○ Variational Autoencoders (VAE) ○ Generative Adversarial Networks (GAN) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Cat Goal: Learn a function to map x y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Classification This image is CC 0 public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. DOG, CAT Object Detection This image is CC 0 public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. GRASS, CAT, TREE, SKY Semantic Segmentation Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. A cat sitting on a suitcase on the floor Image captioning Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n Caption generated using neuraltalk 2 Image is CC 0 Public domain.
Supervised vs Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. K-means clustering Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n This image is CC 0 public domain
Supervised vs Unsupervised Learning Data: x Just data, no labels! Goal: Learn some underlying hidden structure of the data 3 -d Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. 2 -d Principal Component Analysis (Dimensionality reduction) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n This image from Matthias Scholz is CC 0 public domain
Supervised vs Unsupervised Learning Data: x Just data, no labels! Figure copyright Ian Goodfellow, 2016. Reproduced with permission. 1 -d density estimation Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. 2 -d density estimation Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n 2 -d density images left and right are CC 0 public domain
Tasks Supervised Learning x Classification y Discrete x Regression y Continuous x Clustering c Discrete x Dimensionality Reduction z Continuous x Density Estimation p(x) On simplex Unsupervised Learning (C) Dhruv Batra 13
Some Data (C) Dhruv Batra Slide Credit: Carlos Guestrin 14
K-means 1. Ask user how many clusters they’d like. (e. g. k=5) (C) Dhruv Batra Slide Credit: Carlos Guestrin 15
K-means 1. Ask user how many clusters they’d like. (e. g. k=5) 2. Randomly guess k cluster Center locations (C) Dhruv Batra Slide Credit: Carlos Guestrin 16
K-means 1. Ask user how many clusters they’d like. (e. g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) (C) Dhruv Batra Slide Credit: Carlos Guestrin 17
K-means 1. Ask user how many clusters they’d like. (e. g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns (C) Dhruv Batra Slide Credit: Carlos Guestrin 18
K-means 1. Ask user how many clusters they’d like. (e. g. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds out which Center it’s closest to. 4. Each Center finds the centroid of the points it owns… 5. …and jumps there 6. …Repeat until terminated! (C) Dhruv Batra Slide Credit: Carlos Guestrin 19
K-means • Randomly initialize k centers – (0) = 1(0), …, k(0) • Assign: – Assign each point i {1, …n} to nearest center: – • Recenter: – j becomes centroid of its points (C) Dhruv Batra Slide Credit: Carlos Guestrin 20
K-means • Demo – http: //stanford. edu/class/ee 103/visualizations/kmean s. html (C) Dhruv Batra 21
What is K-means optimizing? • Objective F( , C): function of centers and point allocations C: – – 1 -of-k encoding • Optimal K-means: – mina F( , a) (C) Dhruv Batra 22
Coordinate descent algorithms • Want: mina minb F(a, b) • Coordinate descent: – fix a, minimize b – fix b, minimize a – repeat • Converges!!! – if F is bounded – to a (often good) local optimum • as we saw in applet (play with it!) • K-means is a coordinate descent algorithm! (C) Dhruv Batra Slide Credit: Carlos Guestrin 23
K-means as Co-ordinate Descent • Optimize objective function: • Fix , optimize a (or C) (C) Dhruv Batra Slide Credit: Carlos Guestrin 24
K-means as Co-ordinate Descent • Optimize objective function: • Fix a (or C), optimize (C) Dhruv Batra Slide Credit: Carlos Guestrin 25
Supervised vs Unsupervised Learning Supervised Learning Unsupervised Learning Data: x Just data, no labels! Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Supervised vs Unsupervised Learning Supervised Learning Unsupervised Learning Training data is cheap Data: x Just data, no labels! Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Holy grail: Solve unsupervised learning => understand structure of visual world Goal: Learn some underlying hidden structure of the data Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Generative Models Given training data, generate new samples from same distribution Training data ~ pdata(x) Generated samples ~ pmodel(x) Want to learn pmodel(x) similar to pdata(x) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Generative Classification vs Discriminative Classification vs Density Estimation • Generative Classification – Model p(x, y); estimate p(x|y) and p(y) – Use Bayes Rule to predict y – E. g Naïve Bayes • Discriminative Classification – Estimate p(y|x) directly – E. g. Logistic Regression • Density Estimation – Model p(x) – E. g. VAEs (C) Dhruv Batra 29
Generative Models Given training data, generate new samples from same distribution Training data ~ pdata(x) Generated samples ~ pmodel(x) Want to learn pmodel(x) similar to pdata(x) Addresses density estimation, a core problem in unsupervised learning Several flavors: - - Explicit density estimation: explicitly define and solve for pmodel(x) Implicit density estimation: learn model that can sample from pmodel(x) w/o explicitly defining it Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Why Generative Models? - Realistic samples for artwork, super-resolution, colorization, etc. - Generative models of time-series data can be used for simulation and planning (reinforcement learning applications!) Training generative models can also enable inference of latent representations that can be useful as general features - FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) David Berthelot et al. 2017; Phillip Isola et al. 2017. Reproduced with authors permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Taxonomy of Generative Models Direct GAN Generative models Explicit density Tractable density Implicit density Markov Chain Approximate density Fully Visible Belief Nets - NADE Variational - MADE - Pixel. RNN/CNN Variational Autoencoder Change of variables models (nonlinear ICA) GSN Markov Chain Boltzmann Machine Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Taxonomy of Generative Models Direct We will discuss 3 most popular types of generative models GAN Generative models Explicit density Tractable density Implicit density Markov Chain Approximate density Fully Visible Belief Nets - NADE Variational - MADE - Pixel. RNN/CNN Variational Autoencoder Change of variables models (nonlinear ICA) GSN Markov Chain Boltzmann Machine Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. RNN and Pixel. CNN
Fully Observable Model Explicit density model Use chain rule to decompose likelihood of an image x into product of 1 -d distributions: Likelihood of image x Probability of i’th pixel value given all previous pixels Then maximize likelihood of training data Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Fully Observable Model Explicit density model Use chain rule to decompose likelihood of an image x into product of 1 -d distributions: Likelihood of image x Probability of i’th pixel value given all previous pixels Then maximize likelihood of training data Complex distribution over pixel values => Express using a neural network! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Fully Observable Model Explicit density model Use chain rule to decompose likelihood of an image x into product of 1 -d distributions: Likelihood of image x Will need to define ordering of “previous pixels” Probability of i’th pixel value given all previous pixels Then maximize likelihood of training data Complex distribution over pixel values => Express using a neural network! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Example: Character-level Language Model Vocabulary: [h, e, l, o] Example training sequence: “hello” Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. RNN [van der Oord et al. 2016] Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. RNN [van der Oord et al. 2016] Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. RNN [van der Oord et al. 2016] Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Test Time: Sample / Argmax / Beam Search Example: Character -level Language Model Sampling Sample Softmax “e” “l” “o” . 03. 13. 00. 84 . 25. 20. 05. 50 . 11. 17. 68. 03 . 11. 02. 08. 79 Vocabulary: [h, e, l, o] At test-time sample characters one at a time, feed back to model Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. RNN [van der Oord et al. 2016] Generate image pixels starting from corner Dependency on previous pixels modeled using an RNN (LSTM) Drawback: sequential generation is slow! Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. CNN [van der Oord et al. 2016] Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Figure copyright van der Oord et al. , 2016. Reproduced with permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
48
Masked Convolutions • Apply masks so that a pixel does not see “future” pixels (C) Dhruv Batra 49
Pixel. CNN [van der Oord et al. 2016] Softmax loss at each pixel Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training: maximize likelihood of training images Figure copyright van der Oord et al. , 2016. Reproduced with permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Pixel. CNN [van der Oord et al. 2016] Still generate image pixels starting from corner Dependency on previous pixels now modeled using a CNN over context region Training is faster than Pixel. RNN (can parallelize convolutions since context region values known from training images) Generation must still proceed sequentially => still slow Figure copyright van der Oord et al. , 2016. Reproduced with permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Generation Samples 32 x 32 CIFAR-10 32 x 32 Image. Net Figures copyright Aaron van der Oord et al. , 2016. Reproduced with permission. Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
Image Completion 53
Results from generating sounds • https: //deepmind. com/blog/wavenet-generative-model -raw-audio/ 54
Pixel. RNN and Pixel. CNN Pros: - Can explicitly compute likelihood p(x) - Explicit likelihood of training data gives good evaluation metric - Good samples Con: - Sequential generation => slow Improving Pixel. CNN performance - Gated convolutional layers - Short-cut connections - Discretized logistic loss - Multi-scale - Training tricks - Etc… See - Van der Oord et al. NIPS 2016 - Salimans et al. 2017 (Pixel. CNN++) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231 n
- Cs 7643 fall 2020
- Cs 7643 deep learning
- Cs7643
- Cs 4803
- Autoencoders, unsupervised learning, and deep architectures
- Supervised learning dan unsupervised learning
- Perbedaan supervised dan unsupervised
- Cs 4803 gatech
- Cs 4803
- Cs 4803
- Cs 4803
- Cs 7643
- Cs 7643
- Cs 7643 github
- Colah lstm
- Cs 7643
- Unsupervised learning in data mining
- Transductive learning for unsupervised text style transfer
- Ann unsupervised learning
- Workspca
- Unsupervised learning
- What is unsupervised learning algorithm
- Supervised vs unsupervised learning
- Greedy layer wise training
- Supervised and unsupervised learning
- Deep asleep deep asleep it lies
- Deep forest: towards an alternative to deep neural networks
- O the deep deep love of jesus
- Deep learning vs machine learning
- Tony wagner's seven survival skills
- Unsupervised domain adaptation by backpropagation.
- Unsupervised segmentation
- Object based image analysis
- Supervised vs unsupervised data mining
- Unsupervised models for named entity classification
- Sentiment analysis with deep learning using bert
- Unsupervised pos tagging
- The wake-sleep algorithm for unsupervised neural networks
- Unsupervised hierarchical clustering
- Oren freifeld
- Supervised vs unsupervised data mining
- Hima lakkaraju
- Global citizenship education topics and learning objectives
- Operator fusion deep learning
- Andrew ng recurrent neural networks
- Hadoop deep learning
- Gandiva: introspective cluster scheduling for deep learning
- Kaiming he
- Deep learning speech recognition
- Mitesh kapra
- Father of deep learning
- Optimal auctions through deep learning
- New pedagogies for deep learning
- Neural network cost function
- Bird eye view deep learning
- Jeff heaton deep learning