Neural Networks Part 2 Training Perceptrons Handling Multiclass

Training a Neural Network • In some cases, the training process can find the

Regression or Classification? • The perceptron produces a continuous value between 0 and 1.

Using Perceptrons for Multiclass Problems “Multiclass” means that we have more than two classes.

Training Set for the First Perceptron • 23

Training Set for the Second Perceptron • 25

Training Set for the Third Perceptron • 27

Multiclass Neural Networks • For perceptrons, we saw that we can perform multiclass (i.

OVA Perceptrons as a Single Network Layer 1 (Input layer) Layer 2 (Output layer)

A Network for Our Example Layer 1 (Input layer) Layer 2 (1 st Hidden

Input Layer: How many units does it have? Could we have a different number?

In our example, the input layer it must have five units, because each input

• This network has two hidden layers, with four units per layer. •

Output Layer: How many units does it have? Could we have a different number?

• In our example, the output layer must have three units, because we

Network connectivity: • In this neural network, at layers 2, 3, 4, every unit

Next: Training • The next set of slides will describe how to train such

Slides: 40

Download presentation

Neural Networks – Part 2 • Training Perceptrons • Handling Multiclass Problems CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1

Training a Neural Network • In some cases, the training process can find the best solution using a closed-formula. – Example: linear regression, for the sum-of-squares error • In some cases, the training process can find the best weights using an iterative method. – Example: sequential learning for logistic regression. • In neural networks, we cannot find the best weights (unless we have an astronomical amount of luck). – We use gradient descent to find local minima of the error function. – In recent years this approach has produced spectacular results in real-world applications. 2

Notation for Training Set • 3

Perceptron Learning • 4

Perceptron Learning • 5

Regression or Classification? • The perceptron produces a continuous value between 0 and 1. • Thus, perceptrons and neural networks are regression models, since they produce continuous outputs. • However, perceptrons and neural networks can easily be used for classification. • A perceptron can be treated as a binary classifier: – One class label is 0. – One class label is 1. • Neural networks can do multiclassification (more details on that later). 6

Perceptron Learning • 7

Perceptron Learning • 8

Perceptron Learning • 9

Computing the Gradient • 10

Computing the Gradient • 11

Weight Update • 12

Weight Update • 13

Perceptron Learning - Summary • 14

Stopping Criterion • 15

Using Perceptrons for Multiclass Problems “Multiclass” means that we have more than two classes. A perceptron outputs a number between 0 and 1. This is sufficient only for binary classification problems. For more than two classes, there are many different options. • We will follow a general approach called one-versus-all classification (also known as OVA classification). • • – This approach is a general method, that can be combined with various binary classification methods, so as to solve multiclass problems. Here we see the method applied to perceptrons. 16

A Multiclass Example • 17

Converting to One-Versus-All • 18

Converting to One-Versus-All • 19

Converting to One-Versus-All • 20

Converting to One-Versus-All • 21

Converting to One-Versus-All • 22

Training Set for the First Perceptron • 23

Converting to One-Versus-All • 24

Training Set for the Second Perceptron • 25

Converting to One-Versus-All • 26

Training Set for the Third Perceptron • 27

One-Versus-All Perceptrons: Recap • 28

One-Versus-All Perceptrons • 29

OVA Perceptrons as a Single Network Layer 1 (Input layer) Layer 2 (Output layer) 31

Multiclass Neural Networks • For perceptrons, we saw that we can perform multiclass (i. e. , for more than two classes) classification using the one-versus-all (OVA) approach: – We train one perceptron for each class. • These multiple perceptrons can also be thought of as a single neural network. • In the simplest case, a neural network designed to recognize multiple classes looks like the previous example. • In the general case, there also hidden layers. 32

A Network for Our Example Layer 1 (Input layer) Layer 2 (1 st Hidden Layer) Layer 3 (2 nd Hidden Layer) Layer 4 (Output layer) 33

Input Layer: How many units does it have? Could we have a different number? Is the number of input units a hyperparameter? Layer 1 (input) Layer 2 (hidden) Layer 3 (hidden) Layer 4 (output) 34

In our example, the input layer it must have five units, because each input is five-dimensional. We don’t have a choice. Layer 1 (input) Layer 2 (hidden) Layer 3 (hidden) Layer 4 (output) 35

• This network has two hidden layers, with four units per layer. • The number of hidden layers and the number of units per layer are hyperparameters, they can take different values. Layer 2 (hidden) Layer 3 (hidden) 36

Output Layer: How many units does it have? Could we have a different number? Is the number of output units a hyperparameter? Layer 4 (output) 37

• In our example, the output layer must have three units, because we want to recognize three different classes (dog, cat, fox). We have no choice. Layer 4 (output) 38

Network connectivity: • In this neural network, at layers 2, 3, 4, every unit receives as input the output of ALL units in the previous layer. • This is also a hyperparameter, it doesn’t have to be like that. 39

Next: Training • The next set of slides will describe how to train such a network. • Training a neural network is done using gradient descent. • The specific method is called backpropagation, but it really is just a straightforward application of gradient descent for neural networks. 40