Neural Networks Learning Cost function Machine Learning Neural
- Slides: 33
Neural Networks: Learning Cost function Machine Learning
Neural Network (Classification) total no. of layers in network no. of units (not counting bias unit) in layer Layer 1 Layer 2 Layer 3 Binary classification Layer 4 Multi-classification (K classes) E. g. , , , pedestrian car motorcycle truck 1 output unit K output units Andrew Ng
Cost function Logistic regression: Neural network: Andrew Ng
Neural Networks: Learning 역전파 알고리즘 (Backpropagation) Machine Learning
Gradient computation Need code to compute: -
Gradient computation 주어진 하나의 훈련자료( , ): 전방향전파(Forward propagation): Layer 1 Layer 2 Layer 3 Layer 4
Gradient computation: 역전파(Backpropagation) 알고리 즘 Intuition: “error” of node in layer. For each output unit (layer L = 4) Layer 1 Layer 2 Layer 3 Layer 4
Backpropagation algorithm Training set Set (for all ). For k (k Set ) Perform forward propagation to compute (k Using (k , compute ) ) Compute for
Neural Networks: Learning Backpropagation intuition Machine Learning
Forward Propagation
Forward Propagation Andrew Ng
What is backpropagation doing? Focusing on a single example and ignoring regularization ( , ), , the case of 1 output unit, (Think of ) I. e. how well is the network doing on example i? Andrew Ng
Forward Propagation “error” of cost for Formally, (unit in layer ). (for ), where Andrew Ng
Neural Networks: Learning Implementation note: Unrolling parameters Machine Learning
Advanced optimization function [j. Val, gradient] = cost. Function(theta) … opt. Theta = fminunc(@cost. Function, initial. Theta, options) Neural Network (L=4): “Unroll” into vectors - matrices (Theta 1, Theta 2, Theta 3) - matrices (D 1, D 2, D 3) Andrew Ng
Example theta. Vec = [ Theta 1(: ); Theta 2(: ); Theta 3(: )]; DVec = [D 1(: ); D 2(: ); D 3(: )]; Theta 1 = reshape(theta. Vec(1: 110), 10, 11); Theta 2 = reshape(theta. Vec(111: 220), 10, 11); Theta 3 = reshape(theta. Vec(221: 231), 1, 11); Andrew Ng
Learning Algorithm Have initial parameters. Unroll to get initial. Theta to pass to fminunc(@cost. Function, initial. Theta, options) function [jval, gradient. Vec] = cost. Function(theta. Vec) From theta. Vec, get. Use forward prop/back prop to compute Unroll to get gradient. Vec. and . Andrew Ng
Neural Networks: Learning Gradient checking Machine Learning
Numerical estimation of gradients Implement: grad. Approx EPSILON)) = (J(theta + EPSILON) – J(theta – /(2*EPSILON) Andrew Ng
Parameter vector (E. g. is “unrolled” version of ) Andrew Ng
for i = 1: n, theta. Plus = theta; theta. Plus(i) = theta. Plus(i) + EPSILON; theta. Minus = theta; theta. Minus(i) = theta. Minus(i) – EPSILON; grad. Approx(i) = (J(theta. Plus) – J(theta. Minus)) /(2*EPSILON); end; Check that grad. Approx ≈ DVec Andrew Ng
Implementation Note: - Implement backprop to compute DVec (unrolled ). - Implement numerical gradient check to compute grad. Approx. - Make sure they give similar values. - Turn off gradient checking. Using backprop code for learning. Important: - Be sure to disable your gradient checking code before training your classifier. If you run numerical gradient computation on every iteration of gradient descent (or in the inner loop of cost. Function(…))your code will be very slow. Andrew Ng
Neural Networks: Learning Random initialization Machine Learning
Initial value of For gradient descent and advanced optimization method, need initial value for. opt. Theta = fminunc(@cost. Function, initial. Theta, options) Consider gradient descent Set initial. Theta = zeros(n, 1) ? Andrew Ng
Zero initialization After each update, parameters corresponding to inputs going into each of two hidden units are identical. Andrew Ng
Random initialization: Symmetry breaking Initialize each (i. e. ) to a random value in E. g. Theta 1 = rand(10, 11)*(2*INIT_EPSILON) - INIT_EPSILON; Theta 2 = rand(1, 11)*(2*INIT_EPSILON) - INIT_EPSILON; Andrew Ng
Neural Networks: Learning Machine Learning Putting it together
Training a neural network Pick a network architecture (connectivity pattern between neurons) No. of input units: Dimension of features No. output units: Number of classes Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better) Andrew Ng
Training a neural network 1. Randomly initialize weights 2. Implement forward propagation to get for any 3. Implement code to compute cost function 4. Implement backprop to compute partial derivatives for i = 1: m Perform forward propagation and backpropagation using example (Get activations and delta terms for ). Andrew Ng
Training a neural network 5. Use gradient checking to compare computed using backpropagation vs. using numerical estimate of gradient of. Then disable gradient checking code. 6. Use gradient descent or advanced optimization method with backpropagation to try to minimize as a function of parameters Andrew Ng
Andrew Ng
Neural Networks: Learning Backpropagation example: Autonomous driving (optional) Machine Learning
[Courtesy of Dean Pomerleau]
- Cost function of linear regression
- Neural networks and learning machines 3rd edition
- Few shot learning with graph neural networks
- Neural networks and learning machines
- Backpropagation andrew ng
- Visualizing and understanding convolutional networks
- Vc dimension of neural networks
- Merzenich et al (1984) ib psychology
- Audio super resolution using neural networks
- Convolutional neural networks for visual recognition
- Leon gatys
- Efficient processing of deep neural networks pdf
- Mippers
- Convolutional neural network ppt
- Rnn
- Matlab neural network toolbox
- Neural networks for rf and microwave design
- 11-747 neural networks for nlp
- Neural networks simon haykin
- Sparse convolutional neural networks
- On the computational efficiency of training neural networks
- Threshold logic unit in neural network
- Neural networks and fuzzy logic
- Xooutput
- Convolutional neural networks
- Deep forest: towards an alternative to deep neural networks
- Convolutional neural networks
- Neuraltools neural networks
- Andrew ng recurrent neural networks
- Predicting nba games using neural networks
- The wake-sleep algorithm for unsupervised neural networks
- Audio super resolution
- Convolutional neural network alternatives
- Difference between datagram and virtual circuit network