CSCI 5922 Neural Networks and Deep Learning Activation

Output Activation and Loss Functions ü Every neural net specifies § an activation rule

Cheat Sheet 1 ü Perceptron § Activation function § Weight update assumes minimizing number

Cheat Sheet 2 ü Two layer net (a. k. a. logistic regression) § activation

Squared Error Loss ü Sensible regardless of output range and output activation function ü

Logistic • vs. Output =. 5 § when no input evidence, bias=0 • •

Probabilistic Interpretation of Squared-Error Loss ü

Radial Basis Functions ü Image credits: www. dtreg. com bio. felk. cvut. cz

Slides: 20

Download presentation

CSCI 5922 Neural Networks and Deep Learning: Activation and Loss Functions Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder

Output Activation and Loss Functions ü Every neural net specifies § an activation rule for the output unit(s) § a loss defined in terms of the output activation ü First a bit of review…

Cheat Sheet 1 ü Perceptron § Activation function § Weight update assumes minimizing number of misclassifications ü Linear associator (a. k. a. linear regression) § Activation function § Weight update assumes minimizing squared error loss

Cheat Sheet 2 ü Two layer net (a. k. a. logistic regression) § activation function § weight update assumes minimizing squared error loss ü Deep(er) net § activation function same as above § weight update assumes minimizing squared error loss

Squared Error Loss ü Sensible regardless of output range and output activation function ü § with logistic output unit § with tanh output unit

Logistic • vs. Output =. 5 § when no input evidence, bias=0 • • Will trigger activation in next layer Need large biases to neutralize § biases on different scale than other weights Does not satisfy weight initialization assumption of mean activation = 0 • Tanh • Output = 0 § when no input evidence, bias=0 Won’t trigger activation in next layer Don’t need large biases • • Satisfies weight initialization assumption •

Cross Entropy Loss ü

Squared Error Versus Cross Entropy ü

Maximum Likelihood Estimation ü

Probabilistic Interpretation of Squared-Error Loss ü

Categorical Outputs ü

Derivatives For Categorical Outputs ü

Rectified Linear Unit (Re. LU) ü

Leaky Re. LU ü

Softplus ü

Exponential Linear Unit (ELU) ü

Radial Basis Functions ü Image credits: www. dtreg. com bio. felk. cvut. cz

playground. tensorflow. org