Neural Networks CSE 4309 Machine Learning Vassilis Athitsos

  • Slides: 75
Download presentation
Neural Networks CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department

Neural Networks CSE 4309 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1

Perceptrons • 2

Perceptrons • 2

Perceptrons • 3

Perceptrons • 3

Perceptrons • 4

Perceptrons • 4

Perceptrons • 5

Perceptrons • 5

Perceptrons and Neurons • Perceptrons are inspired by neurons. – Neurons are the cells

Perceptrons and Neurons • Perceptrons are inspired by neurons. – Neurons are the cells forming the nervous system, and the brain. – Neurons somehow sum up their inputs, and if the sum exceeds a threshold, they "fire". • Since brains are "intelligent", computer scientists have been hoping that perceptron-based systems can be used to model intelligence. 6

Activation Functions • 7

Activation Functions • 7

Activation Functions • 8

Activation Functions • 8

Example: The AND Perceptron • • Suppose we use the step function for activation.

Example: The AND Perceptron • • Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean AND function: false AND false = false AND true = false true AND false = false true AND true = true 9

Example: The AND Perceptron • false AND false = false AND true = false

Example: The AND Perceptron • false AND false = false AND true = false true AND false = false true AND true = true 10

Example: The AND Perceptron • false AND false = false AND true = false

Example: The AND Perceptron • false AND false = false AND true = false true AND false = false true AND true = true 11

Example: The AND Perceptron • false AND false = false AND true = false

Example: The AND Perceptron • false AND false = false AND true = false true AND false = false true AND true = true 12

Example: The AND Perceptron • false AND false = false AND true = false

Example: The AND Perceptron • false AND false = false AND true = false true AND false = false true AND true = true 13

Example: The OR Perceptron • • Suppose we use the step function for activation.

Example: The OR Perceptron • • Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean OR function: false OR false = false OR true = true OR false = true OR true = true 14

Example: The OR Perceptron • false OR false = false OR true = true

Example: The OR Perceptron • false OR false = false OR true = true OR false = true OR true = true 15

Example: The OR Perceptron • false OR false = false OR true = true

Example: The OR Perceptron • false OR false = false OR true = true OR false = true OR true = true 16

Example: The OR Perceptron • false OR false = false OR true = true

Example: The OR Perceptron • false OR false = false OR true = true OR false = true OR true = true 17

Example: The OR Perceptron • false OR false = false OR true = true

Example: The OR Perceptron • false OR false = false OR true = true OR false = true OR true = true 18

Example: The NOT Perceptron • • Suppose we use the step function for activation.

Example: The NOT Perceptron • • Suppose we use the step function for activation. Suppose boolean value false is represented as number 0. Suppose boolean value true is represented as number 1. Then, the perceptron below computes the boolean NOT function: NOT(false) = true NOT(true) = false 19

Example: The NOT Perceptron • NOT(false) = true NOT(true) = false 20

Example: The NOT Perceptron • NOT(false) = true NOT(true) = false 20

Example: The NOT Perceptron • NOT(false) = true NOT(true) = false 21

Example: The NOT Perceptron • NOT(false) = true NOT(true) = false 21

The XOR Function false XOR false = false XOR true = true XOR false

The XOR Function false XOR false = false XOR true = true XOR false = true XOR true = false • As before, we represent false with 0 and true with 1. • The figure shows the four input points of the XOR function. – green corresponds to output value true. – red corresponds to output value false. • The two classes (true and false) are not linearly separable. • Therefore, no perceptron can compute the XOR function. 22

Neural Networks • A neural network is built using perceptrons as building blocks. •

Neural Networks • A neural network is built using perceptrons as building blocks. • The inputs to some perceptrons are outputs of other perceptrons. • Here is an example neural network computing the XOR function. Unit 3 Unit Output: 5 Unit 4 23

Neural Networks • Unit 3 Unit Output: 5 Unit 4 24

Neural Networks • Unit 3 Unit Output: 5 Unit 4 24

Neural Networks • This neural network example consists of six units: – Three input

Neural Networks • This neural network example consists of six units: – Three input units (including the not-shown bias input). – Three perceptrons. • Yes, inputs count as units. Unit 3 Unit Output: 5 Unit 4 25

Neural Networks • Unit 3 Unit Output: 5 Unit 4 26

Neural Networks • Unit 3 Unit Output: 5 Unit 4 26

Neural Network Layers • • Oftentimes, neural networks are organized into layers. The input

Neural Network Layers • • Oftentimes, neural networks are organized into layers. The input layer is the initial layer of input units (units 0, 1, 2 in our example). The output layer is at the end (unit 5 in our example). Zero, one or more hidden layers can be between the input and output layers. Unit 3 Unit Output: 5 Unit 4 27

Neural Network Layers • • • There is only one hidden layer in our

Neural Network Layers • • • There is only one hidden layer in our example, containing units 4 and 5. Each hidden layer's inputs are outputs from the previous layer. Each hidden layer's outputs are inputs to the next layer. The first hidden layer's inputs come from the input layer. The last hidden layer's outputs are inputs to the output layer. Unit 3 Unit Output: 5 Unit 4 28

Feedforward Networks • Feedforward networks are networks where there are no directed loops. •

Feedforward Networks • Feedforward networks are networks where there are no directed loops. • If there are no loops, the output of a neuron cannot (directly or indirectly) influence its input. • While there are varieties of neural networks that are not feedforward or layered, our main focus will be layered feedforward networks. Unit 3 Unit Output: 5 Unit 4 29

Computing the Output • Unit 3 Unit Output: 5 Unit 4 30

Computing the Output • Unit 3 Unit Output: 5 Unit 4 30

Computing the Output • Unit 3 Unit Output: 5 Unit 4 31

Computing the Output • Unit 3 Unit Output: 5 Unit 4 31

What Neural Networks Can Compute • An individual perceptron is a linear classifier. –

What Neural Networks Can Compute • An individual perceptron is a linear classifier. – The weights of the perceptron define a linear boundary between two classes. • Layered feedforward neural networks with one hidden layer can compute any continuous function. • Layered feedforward neural networks with two hidden layers can compute any mathematical function. • This has been known for decades, and is one reason scientists have been optimistic about the potential of neural networks to model intelligent systems. • Another reason is the analogy between neural networks and biological brains, which have been a standard of intelligence we are still trying to achieve. • There is only one catch: How do we find the right weights? 32

Training a Neural Network • In linear regression, for the sum-of-squares error, we could

Training a Neural Network • In linear regression, for the sum-of-squares error, we could find the best weights using a closed-formula. • In logistic regression, for the cross-entropy error, we could find the best weights using an iterative method. • In neural networks, we cannot find the best weights (unless we have an astronomical amount of luck). – We only have optimization methods that find local minima of the error function. – Still, in recent years such methods have produced spectacular results in real-world applications. 33

Notation for Training Set • 34

Notation for Training Set • 34

Perceptron Learning • 35

Perceptron Learning • 35

Perceptron Learning • 36

Perceptron Learning • 36

Perceptron Learning • 37

Perceptron Learning • 37

Computing the Gradient • 38

Computing the Gradient • 38

Weight Update • 39

Weight Update • 39

Perceptron Learning - Summary • 40

Perceptron Learning - Summary • 40

Stopping Criterion • 41

Stopping Criterion • 41

Using Perceptrons for Multiclass Problems • A perceptron outputs a number between 0 and

Using Perceptrons for Multiclass Problems • A perceptron outputs a number between 0 and 1. • This is sufficient only for binary classification problems. • For more than two classes, there are many different options. • We will follow a general approach called one-versusall classification. 42

One-Versus-All Perceptrons • 43

One-Versus-All Perceptrons • 43

One-Versus-All Perceptrons • 44

One-Versus-All Perceptrons • 44

Neural Network Notation • 45

Neural Network Notation • 45

Target Value Notation • 46

Target Value Notation • 46

Squared Error for Neural Networks • 47

Squared Error for Neural Networks • 47

Squared Error for Neural Networks • 48

Squared Error for Neural Networks • 48

Training Neural Networks • 49

Training Neural Networks • 49

Computing the Gradient • 50

Computing the Gradient • 50

Decomposing the Error Function • 51

Decomposing the Error Function • 51

Decomposing the Error Function • 52

Decomposing the Error Function • 52

 • Previous layers: unknown ? ? ? Output layer 53

• Previous layers: unknown ? ? ? Output layer 53

Decomposing the Error Function • 54

Decomposing the Error Function • 54

Updating Weights of Output Units • 59

Updating Weights of Output Units • 59

 • We computed this already, a few slides ago. 61

• We computed this already, a few slides ago. 61

Formula for Hidden Units • 64

Formula for Hidden Units • 64

Simplifying Notation • 65

Simplifying Notation • 65

Final Backpropagation Formula • 66

Final Backpropagation Formula • 66

Backpropagation for One Object Step 1: Initialize Input Layer • 67

Backpropagation for One Object Step 1: Initialize Input Layer • 67

Backpropagation for One Object Step 2: Compute Outputs • 68

Backpropagation for One Object Step 2: Compute Outputs • 68

Backpropagation for One Object Step 4: Update Weights • 70

Backpropagation for One Object Step 4: Update Weights • 70

Backpropagation Summary • 71

Backpropagation Summary • 71

Classification with Neural Networks • 72

Classification with Neural Networks • 72

Structure of Neural Networks • Backpropagation describes how to learn weights. • However, it

Structure of Neural Networks • Backpropagation describes how to learn weights. • However, it does not describe how to learn the structure: – How many layers? – How many units at each layer? • These are parameters that we have to choose somehow. • A good way to choose such parameters is by using a validation set, containing examples and their class labels. – The validation set should be separate (disjoint) from the training set. 73

Structure of Neural Networks • To choose the best structure for a neural network

Structure of Neural Networks • To choose the best structure for a neural network using a validation set, we try many different parameters (number of layers, number of units per layer). • For each choice of parameters: – We train several neural networks using backpropagation. – We measure how well each neural network classifies the validation examples. – Why not train just one neural network? 74

Structure of Neural Networks • To choose the best structure for a neural network

Structure of Neural Networks • To choose the best structure for a neural network using a validation set, we try many different parameters (number of layers, number of units per layer). • For each choice of parameters: – We train several neural networks using backpropagation. – We measure how well each neural network classifies the validation examples. – Why not train just one neural network? – Each network is randomly initialized, so after backpropagation it can end up being different from the other networks. • At the end, we select the neural network that did best on the validation set. 75