Lecture 1 batchnorm Learning rate decay multi label
Lecture 1: batch-norm, Learning rate decay multi label classification Alireza Akhavan Pour CLASS. VISION SRTTU – A. Akhavan Lecture 1 - 1 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization [Ioffe and Szegedy, 2015] “you want unit gaussian activations? just make them so. ” consider a batch of activations at some layer. To make each dimension unit gaussian, apply: this is a vanilla differentiable function. . . SRTTU – A. Akhavan Lecture 1 - 4 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization [Ioffe and Szegedy, 2015] “you want unit gaussian activations? just make them so. ” N X 1. compute the empirical mean and variance independently for each dimension. 2. Normalize D SRTTU – A. Akhavan Lecture 1 - 5 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization FC BN tanh [Ioffe and Szegedy, 2015] Usually inserted after Fully Connected / (or Convolutional, as we’ll see soon) layers, and before nonlinearity. FC BN tanh. . . SRTTU – A. Akhavan Problem: do we necessarily want a unit gaussian input to a tanh layer? Lecture 1 - 6 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization [Ioffe and Szegedy, 2015] Normalize: Note, the network can learn: And then allow the network to squash the range if it wants to: to recover the identity mapping. SRTTU – A. Akhavan Lecture 1 - 7 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization [Ioffe and Szegedy, 2015] - SRTTU – A. Akhavan Improves gradient flow through the network Allows higher learning rates Reduces the strong dependence on initialization Acts as a form of regularization in a funny way, and slightly reduces the need for dropout, maybe Lecture 1 - 8 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Batch Normalization [Ioffe and Szegedy, 2015] Note: at test time Batch. Norm layer functions differently: The mean/std are not computed based on the batch. Instead, a single fixed empirical mean of activations during training is used. (e. g. can be estimated during training with running averages) SRTTU – A. Akhavan Lecture 1 - 9 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Learning rate decay https: //github. com/keras-team/keras/blob/master/keras/optimizers. py#L 189 SRTTU – A. Akhavan Lecture 1 - 10 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
Learning rate decay SRTTU – A. Akhavan Lecture 1 -
Learning rate decay SRTTU – A. Akhavan Lecture 1 -
ﻣﻨﺎﺑﻊ • Deep learning specialization -Course 2 https: //www. coursera. org/specializations/deep -learning • CS 231 n - Lecture 6 _ Training Neural Networks I http: //cs 231 n. stanford. edu/ SRTTU – A. Akhavan Lecture 1 - 13 ۱۳۹۷ ﻣﻬﺮ ۷ - ﺷﻨﺒﻪ
- Slides: 13