CSCI 5922 Neural Networks and Deep Learning Unsupervised
- Slides: 46
CSCI 5922 Neural Networks and Deep Learning Unsupervised Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder
Discovering High-Level Features Via Unsupervised Learning (Le et al. , 2012) ü Training set § 10 M You. Tube videos § single frame sampled from each § 200 x 200 pixels § fewer than 3% of frames contain faces (using Open. CV face detector)
ü Sparse deep encoder architecture C § 3 encoding layers B § each “layer” consists of three transforms A spatially localized receptive fields to detect features X spatial pooling to get translation invariance subtractive/divisive normalization to get sparse activation patterns § not convolutional § 1 billion weights § train with version of sparse PCA
Some Neurons Become Face Detectors ü Look at all neurons in final layer and find the best face detector
Some Neurons Become Cat and Body Detectors
How Fancy Does Unsupervised Learning Have To Be? ü Coates, Karpathy, Ng (2012) § K-means clustering – to detect prototypical features § agglomerative clustering – to pool together features that are similar under transformation § multiple stages ü Face detectors emerge
Autoencoders ü ü Self-supervised training procedure Given a set of input vectors (no target outputs) Map input back to itself via a hidden layer bottleneck How to achieve bottleneck? § Fewer neurons § Sparsity constraint § Information transmission constraint (e. g. , add noise to unit, or shut off randomly, a. k. a. dropout)
Autoencoder Combines An Encoder And A Decoder Encoder
Stacked Autoencoders . . . copy ü deep network Note that decoders can be stacked to produce a generative domain model
Restricted Boltzmann Machines (RBMs) ü
Deep RBM Autoencoder Hinton & Salakhutdinov (2006)
Why Does Unsupervised Pretraining Help Deep Learning? (Erhan, Bengio, Courville, Manzagol, Vincent, & Bengio 2010) ü “Unsupervised training guides the learning toward basins of attraction of minima that support better generalization from the training set” § More hidden layers -> increase likelihood of poor local minima § result is robust to random initialization seed
Visualizing Learning Trajectory ü Train 50 nets with and without pretraining on MNIST § No easy way to compare weights, but it is possible to compare functions ü Comparison procedure § At various points in training, form giant vector of output for each training example § Perform dimensionality reduction using ISOMAP ü Observations § Pretrained and not-pretrained models start and stay in different regions of function space § More variance (different local optima? ) with not-pretrained models
Using Autoencoders To Initialize Weights For Supervised Learning ü Effective pretraining of weights should act as a regularizer § Limits the region of weight space that will be explored by supervised learning ü Autoencoder must be learned well enough to move weights away from origin
The Danger Of Unsupervised Pretraining: Simple Autoencoder ü y w 2 h w 1 x
Simple Autoencoder: Weight Search
log sum squared error Simple Autoencoder: Error Surface w 1 w 2
Simple Supervised Problem ü y w 2 h w 1 x
w 1 w 2 supervised log sum squared error Simple Supervised Problem: Error Surface w 2 unsupervised
Variational Autoencoders For Pattern Synthesis ü Suppose we want to use an autoencoder to generate images. Decoder Encoder
Problem With Using Decoder For Generation ü ü Because we don’t know the distribution of data in the latent space… No assurance that random activation pattern will produce a sensible output
Reinterpreting Decoder As Probabilistic Model ü
How Do We Train Decoder? ü
Adding the Encoder ü
Optimization problem ü
Intuition ü
Complete Model From Kingma ICLR 2014 Slides
Random Samples From Generative Model Trained on Faces in the Wild From Kingma ICLR 2014 slides
Examples ü ü VAE faces demo VAE MNIST VAE street addresses
Contractive Autoencoders ü ü A good hidden representation will be insensitive to most small changes in the input (while still preserving information to reconstruct the input) Objective function § Frobenius norm of the Jacobian § For logistic hidden: Tries to make the hidden units saturate, especially where input-to-hidden weights are large
Sparse Encodings ü We may wish to have few units in the hidden bottleneck representation for any input § Biological motivation ü E. g. , unsupervised training on natural image patches PCA Olshausen & Field (1996)
Sparse Encoding ü Loss function has two components § information preservation input reconstruction in autoencoder mutual information in simple encoder § sparsity constraint
Examples of Sparsity Constraints ü
Discovering Binary Codes ü Y A’ B’ C B A X
Discovering Binary Codes ü
Denoising Autoencoders ü ü ü Randomly set inputs to zero (like dropout on inputs) Target output has missing features filled in Visualization § related to attractor nets ü Removing features better than features with additive Gaussian noise
Attractor Networks For Pattern Completion
Training Attractor Networks ü Almeida / Pineda algorithm
Comments on Almeida / Pineda Algorithm ü This method works only if network reaches a fixed (stable) point in both forward and backward phases § can show that if network is stable in forward direction, it is also stable in the backward direction. § Can show that network will be stable in the forward direction if weights are symmetric (wij = wji) ü We’ll see this with Hopfield network Algorithm makes no guarantee about network dynamics § i. e. , can’t train the net to settle in a short amount of time
Aditya’s Image Superresolution
Recursive Neural Nets ü ü Pollack (1991) Recursive Auto-Associative Memory (RAAM) § Representing stacks
Recursive Neural Nets ü ü Pollack (1991) Recursive Auto-Associative Memory (RAAM) § Representing stacks
Recursive Neural Nets ü ü Pollack (1991) Recursive Auto-Associative Memory (RAAM) § Representing binary trees
Recursive Neural Nets ü ü Pollack (1991) Recursive Auto-Associative Memory (RAAM) § Representing binary trees
- Csci 5922
- Csci 5922
- Deep forest: towards an alternative to deep neural networks
- The wake-sleep algorithm for unsupervised neural networks
- Autoencoders, unsupervised learning, and deep architectures
- Mippers
- Efficient processing of deep neural networks
- Least mean square algorithm in neural network
- Neural networks and learning machines
- "deep reinforcement learning"
- Supervised learning adalah
- Few shot learning with graph neural networks
- Supervised and unsupervised learning
- Deep learning approach and surface learning approach
- Visualizing and understanding convolutional networks
- Neural networks for rf and microwave design
- Neural networks and fuzzy logic
- Unsupervised learning in data mining
- Transductive learning for unsupervised text style transfer
- Ann unsupervised learning
- Is pca unsupervised learning
- Unsupervised learning
- Machine learning andrew ng
- Wsbgixdc9g8 -site:youtube.com
- Contractive autoencoder
- Deep learning vs machine learning
- Vc dimension neural network
- Neural network ib psychology
- Audio super resolution using neural networks
- Convolutional neural networks for visual recognition
- Style transfer
- Cnn ppt
- Pixel recurrent neural networks
- Neural network matlab toolbox
- 11-747 neural networks for nlp
- Neural networks simon haykin
- Csrmm
- On the computational efficiency of training neural networks
- Tlu perceptron
- Netinsights
- Lmu cis
- Convolutional neural networks
- Neuraltools neural networks
- Andrew ng lstm
- Predicting nba games using neural networks
- Audio super resolution using neural networks
- Alternatives to convolutional neural networks