CSCI 5922 Neural Networks and Deep Learning: Activation and Loss Functions Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder
Output Activation and Loss Functions ü Every neural net specifies § an activation rule for the output unit(s) § a loss defined in terms of the output activation ü First a bit of review…
Cheat Sheet 1 ü Perceptron § Activation function § Weight update assumes minimizing number of misclassifications ü Linear associator (a. k. a. linear regression) § Activation function § Weight update assumes minimizing squared error loss
Cheat Sheet 2 ü Two layer net (a. k. a. logistic regression) § activation function § weight update assumes minimizing squared error loss ü Deep(er) net § activation function same as above § weight update assumes minimizing squared error loss
Squared Error Loss ü Sensible regardless of output range and output activation function ü § with logistic output unit § with tanh output unit
Logistic • vs. Output =. 5 § when no input evidence, bias=0 • • Will trigger activation in next layer Need large biases to neutralize § biases on different scale than other weights Does not satisfy weight initialization assumption of mean activation = 0 • Tanh • Output = 0 § when no input evidence, bias=0 Won’t trigger activation in next layer Don’t need large biases • • Satisfies weight initialization assumption •
Cross Entropy Loss ü
Squared Error Versus Cross Entropy ü
Maximum Likelihood Estimation ü
Probabilistic Interpretation of Squared-Error Loss ü
Probabilistic Interpretation of Squared-Error Loss ü
Categorical Outputs ü
Categorical Outputs ü
Derivatives For Categorical Outputs ü
Rectified Linear Unit (Re. LU) ü
Leaky Re. LU ü
Softplus ü
Exponential Linear Unit (ELU) ü
Radial Basis Functions ü Image credits: www. dtreg. com bio. felk. cvut. cz