Neural Network Terminology A neural network is composed

  • Slides: 8
Download presentation
Neural Network Terminology A neural network is composed of a number of units (nodes)

Neural Network Terminology A neural network is composed of a number of units (nodes) that are connected by links. Each link has a weight associated with it. Each unit has an activation level and a means to compute the activation level at the next step in time. n Most neural networks are decomposed of a linear component called input function, and a non-linear component call activation function. Popular activation functions include: step-function, sign-function, and sigmoid function. n The architecture of a neural network determines how units are connected and what activation function are used for the network computations. Architectures are subdivided into feed-forward and recurrent networks. Moreover, single layer and multi-layer neural networks (that contain hidden units) are distinguished. n Learning in the context of neural networks mostly centers on finding “good” weights for a given architecture so that the error in performing a particular task is minimized. Most approaches center on learning a function from a set of training examples, and use hill-climbing and steepest decent hill-climbing approaches to find the best values for the weights. n Ch. Eick: More on Machine Learning & Neural Networks

Perceptron Learning Example n 1. 2. 3. 4. Learn y=x 1 and x 2

Perceptron Learning Example n 1. 2. 3. 4. Learn y=x 1 and x 2 for examples (0, 0, 0), (0, 1, 0), (1, 0, 0), (1, 1, 1) and learning rate 0. 5 and initial weights w 0=1; w 1=w 2=0. 8; step 0 is used as the activation function w 0 is set to 0. 5; nothing else changes --- First example w 0 is set to 0; w 2 is set to 0. 3 --- Second example w 0 is set to – 0. 5; w 1 is set to 0. 3 --- Third example No more errors occurs for those weights for the four examples 1 x 1 w 0 w 1 Step 0 -Unit x 2 y w 2 Perceptron Learning Rule: Wj: = Wj + a*Aj*(T-O) Ch. Eick: More on Machine Learning & Neural Networks

Neural Network Learning --Mostly Steepest Descent Hill Climbing on a Differentiable Error Function Important:

Neural Network Learning --Mostly Steepest Descent Hill Climbing on a Differentiable Error Function Important: How far you junp depends on • the learning rate a. • On the error |T-O| Current Weight Vector Direction of the steepest descent with respect to the error function Remarks on a: • too low slow convergence • too high might overshoot goal Ch. Eick: More on Machine Learning & Neural Networks New Weight Vector

Back Propagation Algorithm Initialize the weights in the network (often randomly) 2. repeat for

Back Propagation Algorithm Initialize the weights in the network (often randomly) 2. repeat for each example e in the training set do a. O = neural-net-output(network, e) ; forward pass b. T = teacher output for e c. Calculate error (T - O) at the output units d. Compute error term Di for the output node e. Compute error term Di for nodes of the intermediate layer f. Update the weights in the network Dwij=a*ai*Dj until all examples classified correctly or stopping criterion satisfied 3. return(network) 1. Ch. Eick: More on Machine Learning & Neural Networks

Updating Weights in Neural Networks wij: = Old_wij + a*input_activationi*associate Perceptron: Associated_Error: =(T-0) 2

Updating Weights in Neural Networks wij: = Old_wij + a*input_activationi*associate Perceptron: Associated_Error: =(T-0) 2 -layer Network: Associated_Error: = 1. Output Node i: g’(zi)*(T-0) 2. Intermediate Node k connected to i: g’(zk)*w ki *error_at_node_i a 1 I 1 w 13 D 3 a 3 D 4 w 35 w 23 a 2 I 2 a 1 a 5 D 5 w 14 D 5 I 1 w 45 a 2 a 4 w 24 Multi-layer Network Ch. Eick: More on Machine Learning & Neural Networks error w 13 error w 23 a 3 I 2 Perceptron

Back Propagation Formula Example I 1 g(x)= 1/(1+e-x ) g’(x)= (1 -x)*x g is

Back Propagation Formula Example I 1 g(x)= 1/(1+e-x ) g’(x)= (1 -x)*x g is the learning rate w 13 a 3 w 35 w 23 a 5 w 14 I 2 w 45 a 4 w 24 a 4=g(z 4)=g(x 1*w 14+x 2*w 24) a 3=g(z 3)=g(x 1*w 13+x 2*w 23) a 5=g(z 5)=g(a 3*w 35+a 4*w 45) D 5=error*g’(z 5)=error*a 5*(1 -a 5) D 4= D 5*w 45*g’(z 4)=D 5*w 45*a 4*(1 -a 4) D 3=D 5*w 35*a 3*(1 -a 3) Ch. Eick: More on Machine Learning & Neural Networks w 35= w 35 + g*a 3*D 5 w 45= w 45 + g*a 4*D 5 w 13= w 13 + g*x 1*D 3 w 23= w 23 + g*x 2*D 3 w 14= w 14 + g*x 1*D 4 w 24= w 24 + g*x 2*D 4

Example BP I 1 Example: all weights are 0. 1 except w 45=1; g=0.

Example BP I 1 Example: all weights are 0. 1 except w 45=1; g=0. 2 Training Example: (x 1=1, x 2=1; a 5=1) g is the sigmoid function w 13 a 3 w 35 w 23 a 5 w 14 I 2 a 5 is 0. 6483 with the adjusted weights! w 45 a 4 w 24 a 4=g(z 4)=g(x 1*w 14+x 2*w 24)=g(0. 2)=0. 550 a 3=g(z 3)=g(x 1*w 13+x 2*w 23)=g(0. 2)=0. 550 a 5=g(z 5)=g(a 3*w 35+a 4*w 45)=g(0. 605)=0. 647 D 5=error*g’(z 5)=error*a 5*(1 -a 5)= 0. 647*0. 353=0. 08 D 4=D 5*w 45*a 4*(1 -a 4)=0. 02 D 3=D 5*w 35*a 3*(1 -a 3)=0. 002 Ch. Eick: More on Machine Learning & Neural Networks w 35= w 35 + g*a 3*D 5= 0. 1+0. 2*0. 55*0. 08=0. 109 w 45= w 45 + g*a 4*D 5=1. 009 w 13= w 13 + g*x 1*D 3=0. 1004 w 23= w 23 + g*x 2*D 3=0. 1004 w 14= w 14 + g*x 1*D 4=0. 104 w 24= w 24 + g*x 2*D 4=0. 104 a 4’=g(0. 2044)=0. 551 a 3’=g(0. 2044)=0. 551 a 5’=g(0. 611554)=0. 6483

Example BP I 1 Example: all weights are 0. 1 except w 45=1; g=1

Example BP I 1 Example: all weights are 0. 1 except w 45=1; g=1 Training Example: (x 1=1, x 2=1; a 5=1) g is the sigmoid function w 13 a 3 w 35 w 23 a 5 w 14 I 2 a 5 is 0. 6594 with the adjusted weights! w 45 a 4 w 24 a 4=g(z 4)=g(x 1*w 14+x 2*w 24)=g(0. 2)=0. 550 a 3=g(z 3)=g(x 1*w 13+x 2*w 23)=g(0. 2)=0. 550 a 5=g(z 5)=g(a 3*w 35+a 4*w 45)=g(0. 605)=0. 647 D 5=error*g’(z 5)=error*a 5*(1 -a 5)= 0. 647*0. 353=0. 08 D 4=D 5*w 45*a 4*(1 -a 4)=0. 02 D 3=D 5*w 35*a 3*(1 -a 3)=0. 002 Ch. Eick: More on Machine Learning & Neural Networks w 35= w 35 + g*a 3*D 5= 0. 1+1*0. 55*0. 08=0. 145 w 45= w 45 + g*a 4*D 5=1. 045 w 13= w 13 + g*x 1*D 3=0. 102 w 23= w 23 + g*x 2*D 3=0. 102 w 14= w 14 + g*x 1*D 4=0. 12 w 24= w 24 + g*x 2*D 4=0. 12 a 4’=g(0. 222)=0. 555 a 3’=g(0. 222)=0. 555 a 5’=g(0. 66045)=0. 6594