Chapter 2 BY RAUF UR RAHIM Table to

Chapter 2 BY RAUF UR RAHIM

Table to discuss Problem type Last-layer activation Loss function Binary classification sigmoid binary_crossentropy Multiclass, single-label classification softmax categorical_crossentropy Multiclass, multilabel classification sigmoid binary_crossentropy Regression to arbitrary values None mse Regression to values between 0 and 1 sigmoid mse or binary_crossentropy

Binary Classification

Multiclass, single-label classification

Multiclass, multi-label classification

Regression Problem Size of House (Square Feet) # of Bedrooms Price ($1000) 523 1 115 645 1 150 708 2 210 1034 3 280 2290 4 355 2545 4 440 1123 2 250

Table to discuss Problem type Last-layer activation Loss function Binary classification sigmoid binary_crossentropy Multiclass, single-label classification softmax categorical_crossentropy Multiclass, multilabel classification sigmoid binary_crossentropy Regression to arbitrary values None mse Regression to values between 0 and 1 sigmoid mse or binary_crossentropy

Activation function • Which talk to listen?

Some Act Functions

sigmoid vs softmax • Sigmoid returns probability of binary class • Softmax returns probability of each class

Table to discuss Problem type Last-layer activation Loss function Binary classification sigmoid binary_crossentropy Multiclass, single-label classification softmax categorical_crossentropy Multiclass, multilabel classification sigmoid binary_crossentropy Regression to arbitrary values None mse Regression to values between 0 and 1 sigmoid mse or binary_crossentropy

Loss Functions

Mean Absolute Error

Mean Square Error

Cross Entropy

binary_crossentropy v/s categorical_crossentropy I know dacoit A I know dacoit B 1 I know All 3 I know dacoit C 0 1 [1, 0, 1] Report To SHO

Optimizers 1. The mechanism through which the network will update itself based on the data it sees and its loss function. 2. The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on. 3. The optimizer determines how learning proceeds 4. The optimizer uses this loss value to update the network’s weights 5. The rmsprop optimizer is generally a good enough choice, whatever your problem. That’s one less thing for you to worry about. Y = WX + B

Gradient-Based Optimization 1. Draw a batch of training samples x and corresponding targets y. 2. Run the network on x (a step called the forward pass) to obtain predictions y_pred. 3. Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y. 4. Update all weights of the network in a way that slightly reduces the loss on this batch.

Gradient-Based Optimization Minimum loss required Input X Weights Layer (data transformation) Weight Update Optimizer Prediction Y’ Target Y Loss Function

Gradient-Based Optimization Input (X) Output (Y) 2 5 3 7 4 9 6 13 Y = WX + B

Gradient-Based Optimization Input (X) Output (Y) Predicted Output 2 5 2. 5 3 7 3. 5 4 9 4. 5 6 13 6. 5 Y = 1 X + 0. 5

Gradient-Based Optimization Input (X) Output (Y) Predicted Output 2 5 4. 5 3 7 6. 5 4 9 8. 5 6 13 12. 5 Y = 2 X + 0. 5

Gradient-Based Optimization Input (X) Output (Y) Predicted Output 2 5 5. 5 3 7 7. 5 4 9 9. 5 6 13 13. 5 Y = 2 X + 1. 5

Gradient-Based Optimization Input (X) Output (Y) Predicted Output 2 5 5 3 7 7 4 9 9 6 13 13 Y = 2 X + 1

Gradient-Based Optimization • Do you think this is an efficient approach? • Hit and trial • Who will decide • • The magnitude of change The direction of change • Cong, you have differential and gradient

How it works

Reality

Stochastic gradient descent

Derivative of a Tensor Operation: the gradient • Represents rate of change & Direction • For example, a car’s speed and direction w. r. t destination • How we know about destination? The zero speed

Derivative in simple words

Learning Rate

Training v/s Validation v/s Testing Accuracy • Training accuracy: During Learning • Validation accuracy: Testing during learning after each epochs • Testing accuracy: One final accuracy

Training v/s Validation v/s Testing Data

Training v/s Validation v/s Testing Accuracy

Testing Training Accuracy Testing Accuracy Difference Reason? In simple words No Perfectly Fit Learnt Good Concepts No Under Fit Not Learnt Anything High Over Fit Ratta

Model Creation Weight Update Input X Weights Layer (data transformation) Prediction Y’ Optimize r Phase 1 Manufacturing of car Target Y Loss Function