Disadvantages of Discrete Neurons Only boolean valued functions
Disadvantages of Discrete Neurons • Only boolean valued functions can be computed • A simple learning algorithm for multi-layer discrete-neuron perceptrons is lacking • The computational capabilities of single-layer discrete-neuron perceptrons is limited These disadvantages disappear when we consider multi-layer continuous-neuron perceptrons 9/16/2020 Rudolf Mak TU/e Computer Science 1
Preliminaries • A continuous-neuron perceptron with n input and m outputs computes: – a function Rn ! [0, 1]m , when the sigmoid activation function is used – a function Rn ! Rm , when a linear activation function is used • The learning rules for continuous-neuron perceptrons are based on optimization techniques for error-functions. This requires a continuous and differentiable error function. 9/16/2020 Rudolf Mak TU/e Computer Science 2
Sigmoid transfer function 9/16/2020 Rudolf Mak TU/e Computer Science 3
Computational Capabilities Let g: [0, 1]n!R be a continuous function and let there exists a two layer perceptron with: . Then 1. First layer build from neurons with threshold and standard sigmoid activation function 2. Second layer build from one neuron without threshold and linear activation function 3. such that the function G computed by this network satis 4. fies 9/16/2020 Rudolf Mak TU/e Computer Science 4
Single-layer networks • Compute function from Rn to [0, 1]m • Sufficient to consider a single neuron – Compute a function f(w 0 + 1 · j · n wjxj ) – Assume x 0 = 1 then compute a function f( 0 · j · n wjxj ) 9/16/2020 Rudolf Mak TU/e Computer Science 5
Error function 9/16/2020 Rudolf Mak TU/e Computer Science 6
Gradient Descent 9/16/2020 Rudolf Mak TU/e Computer Science 7
Update of Weight i by Training Pair q 9/16/2020 Rudolf Mak TU/e Computer Science 8
Delta Rule Learning (incremental version, arbitrary transfer function) 9/16/2020 Rudolf Mak TU/e Computer Science 9
Stopcriteria • The mean square error becomes small enough • The mean square error does not decrease anymore, i. e. the gradient has become very small or even changes sign • The maximum number of iterations has been exceeded 9/16/2020 Rudolf Mak TU/e Computer Science 12
Remarks • Delta rule learning is also called L(east) M(ean) S(quare) learning or Widrow Hoff learning • Note that the incremental version of the delta rule is strictly not a gradient descent algorithm, because in each step a different error function E(q) is used • Convergence of the incremental version can only be guaranteed if the learning parameter a goes to 0 during learning 9/16/2020 Rudolf Mak TU/e Computer Science 13
Perceptron Learning Rule (batch version, arbitrary transfer function) 9/16/2020 Rudolf Mak TU/e Computer Science 14
Perceptron Learning Delta Rule (batch version, sigmoidal transfer function) 9/16/2020 Rudolf Mak TU/e Computer Science 15
Perceptron Learning Rule (batch version, linear transfer function) 9/16/2020 Rudolf Mak TU/e Computer Science 16
Convergence of the batch version For small enough learning parameter the batch version of the delta rule always converges. The resulting weights, however, may correspond to a local minimum of the error function, instead of the global minimum 9/16/2020 Rudolf Mak TU/e Computer Science 17
Linear Neurons and Least Squares 9/16/2020 Rudolf Mak TU/e Computer Science 18
Linear Neurons and Least Squares 9/16/2020 Rudolf Mak TU/e Computer Science 19
C is non-singular 9/16/2020 Rudolf Mak TU/e Computer Science 20
Linear Least Squares Convergence 9/16/2020 Rudolf Mak TU/e Computer Science 21
9/16/2020 Rudolf Mak TU/e Computer Science 22
Linear Least Squares Convergence 9/16/2020 Rudolf Mak TU/e Computer Science 23
Find the line: 9/16/2020 Rudolf Mak TU/e Computer Science 24
Solution: 9/16/2020 Rudolf Mak TU/e Computer Science 25
- Slides: 23