Function Approximation Fariba Sharifian Somaye Kafi Function Approximation
Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006 1
Contents l l Introduction to Counterpropagation Full Counterpropagation l l l Architecture Algorithm Application example Forward only Counterpropagation l l Architecture Algorithm Application example Function Approximation spring 2006 2
Contents l Function Approximation Using Neural Network l l l Introduction Development of Neural Network Weight Equations Algebra Training Algorithms l Exact Matching of Function Input –Output Data l Approximate Matching of Gradient Data in Algebra Training l Approximate Matching of Function Input-Output Data l Exact Matching of Function Gradient Data Function Approximation spring 2006 3
Introduction to Counterpropagation l l l are multilayer networks based on combination of input, clustering and output layers can be used to compress data, to approximate functions, or to associate patterns approximate its training input vectors pair by adoptively constructing a lookup table Function Approximation spring 2006 4
Introduction to Counterpropagation (cont. ) l training has two stages l l l Clustering Output weight updating There are two types of it l l Full Forward only Function Approximation spring 2006 5
Full Counterpropagation l Produces an approximation x*: y* based on l l l input of an x vector input of a y vector only input of an x: y , possibly with some distorted or missing elements in either or both vectors. Function Approximation spring 2006 6
Full Counterpropagation (cont. ) l Phase 1 l The units in the cluster layer compete. The learning rule for weight updates on the winning cluster unit is (only the winning unit is allowed to learn) Function Approximation spring 2006 7
Full Counterpropagation (cont. ) l Phase 2 l The weights from the winning cluster unit J to the output units are adjusted so that the vector of activations of the units in the Y output layer, y*, is an approximation to the input vector y; x*, is an approximation to the input vector x. The weight updates for the units in the Y output and X output layers are Function Approximation spring 2006 8
Architecture of Full Counterpropagation X 1 w Xi Y 1 * Ym * Zp Cluster layer Y 1 Yk Ym Zj Y k* Function Approximation spring 2006 u Z 1 Xn v Hidden layer t X 1* Xi * X n* 9
Full Counterpropagation Algorithm Function Approximation spring 2006 10
Full Counterpropagation Algorithm (phase 1) l l l l Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3 -8 Step 3. For each training input pair x: y, do Step 4 -6 Step 4. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 5. Find winning cluster unit; call its index J Step 6. Update weights for unit ZJ: l l Function Approximation spring 2006 Step 7. Reduce learning rate and . Step 8. Test stopping condition for Phase 1 training 11
Full Counterpropagation algorithm (phase 2) l Step 9. While stopping condition for Phase 2 is false, do Step 1016 l (Note: and are small, constant values during phase 2) l Step 10. For each training input pair x: y, do Step 11 -14 l l Function Approximation spring 2006 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ: 12
Full Counterpropagation Algorithm (phase 2)(cont. ) l Step 14. Update weights from unit ZJ to the output layers Step 15. Reduce learning rate a and b. Step 16. Test stopping condition for Phase l l 2 training. Function Approximation spring 2006 13
Which cluster is the winner? l dot product (find the cluster with the largest net input) l Euclidean distance (find the cluster with smallest square distance from the input) Function Approximation spring 2006 14
Full Counterpropagation Application l The application for counterpropagation is as follows: Step 0: initialize weights. l step 1: for each input pair x: y, do step 2 -4. l Step 2: set X input layer activation to vector x set Y input layer activation to vector Y; l Function Approximation spring 2006 15
Full Counterpropagation Application (cont. ) l Step 3: find cluster unit Z, that is closest to the input pair l Step 4: compute approximations to x and y: l X*i=tji l Y*k=ujk Function Approximation spring 2006 16
Full counterpropagation example l l l l Function approximation of y=1/x After training phase we have Cluster unit z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 z 9 z 10 Function Approximation spring 2006 v 0. 11 0. 14 0. 20 0. 30 0. 60 1. 60 3. 30 5. 00 7. 00 9. 00 w 9. 0 7. 0 5. 0 3. 3 1. 6 0. 3 0. 2 0. 14 0. 11 17
Full counterpropagation example (cont. ) X 1 0. 14 0. 2 9. 0 Function Approximation spring 2006 9. 0 Z 1 7. 0 5. 0 Z 2. . . 7. 0 Y 1* Y 1 5. 0 Z 10 0. 14 0. 2 0. 11 X 1* 18
Full counterpropagation example (cont. ) l l To approximate value for y for x=0. 12 As we don’t know any thing about y compute D just by means of x l l D 1=(. 12 -. 11)2 =. 0001 D 2=. 0004 D 3=. 064 D 4=. 032 D 5=. 23 D 6=2. 2 D 7=10. 1 D 8=23. 8 D 9=47. 3 D 10=81 Function Approximation spring 2006 19
Forward Only Counterpropagation l Is a simplified version of the full counterpropagation l Are intended to approximate y=f(x) function that is not necessarily invertible l It may be used if the mapping from x to y is well defined, but the mapping from y to x is not. Function Approximation spring 2006 20
Forward Only Counterpropagation Architecture X Y X 1 w X Y Z 1 Xi Zj Xn Zp Input layer Function Approximation spring 2006 Cluster layer u Y 1 Yk Ym Output layer 21
Forward Only Counterpropagation Algorithm l l l Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3 -8 Step 3. For each training input x, do Step 4 -6 Step 4. Set X input layer activations to vector x Step 5. Find winning cluster unit; call its index j Step 6. Update weights for unit ZJ: l l l Function Approximation spring 2006 Step 7. Reduce learning rate Step 8. Test stopping condition for Phase 1 training. 22
l Step 9. While stopping condition for Phase 2 is false, do Step 10 -16 (Note: is small, constant values during phase 2) Step 10. For each training input pair x: y, do Step 11 -14 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ ( is small) l Step 14. Update weights from unit ZJ to the output layers l Step 15. Reduce learning rate a. Step 16. Test stopping condition for Phase 2 training. l l l l Function Approximation spring 2006 23
Forward Only Counterpropagation Application l l Step 0: initialize weights (by training in previous subsection). Step 1: present input vector x. Step 2: find unit J closest to vector x. Step 3: set activation output units: yk=ujk Function Approximation spring 2006 24
Forward only counterpropagation example l l l l Function approximation of y=1/x After training phase we have Cluster unit z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 z 9 z 10 Function Approximation spring 2006 w 0. 5 1. 5 2. 5. . . 9. 5 u 5. 5 0. 75 0. 4. . . 0. 1 25
Function Approximation Using Neural Network ØIntroduction l. Development of Neural Network Weight Equations l. Algebra Training Algorithms l. Exact Matching of Function Input –Output Data l. Approximate Matching of Gradient Data in Algebra Training l. Approximate Matching of Function Input-Output Data l. Exact Matching of Function Gradient Data Function Approximation spring 2006 26
Introduction l analytical description for a set of data l referred to as data modeling or system identification Function Approximation spring 2006 27
standard tools l l l Splines Wavelets Neural network Function Approximation spring 2006 28
Why Using Neural Network l Splines & Wavelets not generalize well to higher 3 dimensional spaces l universal approximators l parallel architecture l trained to map multidimensional nonlinear functions Function Approximation spring 2006 29
Why Using Neural Network (cont) l Central to the solution of differential equations. l l l Provide differentiable closed-analytic- form solutions have very good generalization properties widely applicable l translates into a set of nonlinear, transcendental weight equations l cascade structure l l nonlinearity of the hidden nodes linear operations in the input and output layers Function Approximation spring 2006 30
Function Approximation Using Neural Network l l functions not known analytically have a set of precise input–output samples functions modeled using an algebraic approach design objectives: l l exact matching approximate matching feedforward neural networks Data: l l l Input Output And/or gradient information Function Approximation spring 2006 31
Objective l exact solutions l sufficient degrees of freedom l l retaining good generalization properties synthesize a large data set by a parsimonious network Function Approximation spring 2006 32
Input-to-node values l algebraic training base l if all sigmoidal functions inputs are known weight equations become algebraic l input-to-node values, sigmoidal functions inputs l determine the saturation level of each sigmoid at a given data point Function Approximation spring 2006 33
weight equations structure l analyze & train a nonlinear neural network l means l linear algebra controlling the distribution controlling the saturation level of the active nodes Function Approximation spring 2006 34
Function Approximation Using Neural Network üIntroduction ØDevelopment of Neural Network Weight Equations l. Algebra Training Algorithms l. Exact Matching of Function Input –Output Data l. Approximate Matching of Gradient Data in Algebra Training l. Approximate Matching of Function Input-Output Data l. Exact Matching of Function Gradient Data Function Approximation spring 2006 35
Development of Neural Network Weight Equations l Objective l l approximate a smooth scalar function of q Inputs using a feedforward sigmoidal network Function Approximation spring 2006 36
Derivative information l can improve network’s generalization properties partial derivatives with input l can be incorporated in the training set l Function Approximation spring 2006 37
Network Output l l l l z: computed as a nonlinear transformation w: input weight p: input b: bias d: output bias v: output weight : sigmoid functions l l such as: input-to-node variables Function Approximation spring 2006 38
Scalar Out. Put of Network Function Approximation spring 2006 39
Exactly Match of the Function’s Outputs l output weighted equation Function Approximation spring 2006 40
Gradient Equations l derivative of the network output with respect to its inputs Function Approximation spring 2006 41
Exact Matching of the Function’s Derivatives l gradient weight equations Function Approximation spring 2006 42
Input-to-node Weight Equations l rewriting 12 Function Approximation spring 2006 43
Four Algebraic Algorithms l Exact Matching of Function Input –Output Data l Approximate Matching of Gradient Data in Algebra Training l Approximate Matching of Function Input-Output Data l Exact Matching of Function Gradient Data Function Approximation spring 2006 44
Function Approximation Using Neural Network üIntroduction üDevelopment of Neural Network Weight Equations üAlgebra Training Algorithms ØExact Matching of Function Input –Output Data l. Approximate Matching of Gradient Data in Algebra Training l. Approximate Matching of Function Input-Output Data l. Exact Matching of Function Gradient Data Function Approximation spring 2006 45
A. Exact Matching of Function Input-Output Data l l l Input S is known matrix p s strategy for producing a well-conditioned S l input weights l o § l random number N(0, 1) L scaling factor § § user-defined scalar input-to-node values that do not saturate the sigmoids Function Approximation spring 2006 46
Input bias l The input bias d is computed to center each sigmoid at one of the training pairs from Function Approximation spring 2006 47
l Finally, the linear system in (9) is solved for v by inverting S Function Approximation spring 2006 48
l 17 produced an ill-conditioned S => computation repeated Function Approximation spring 2006 49
Exact Input-Output-Based Algebraic Algorithm Function Approximation spring 2006 50 Fig. 2 -a. Exact input–output-based algebraic algorithm
Exact Input-Output-Based Algebraic Algorithm with gradient information. Fig. 2 -b. Exact input –output-based algebraic algorithm with added p-steps for incorporating gradient information. Function Approximation spring 2006 51
Then l Exact matching l l Input output gradient information solved exactly simultaneously for the neural parameters. Function Approximation spring 2006 52
Function Approximation Using Neural Network üIntroduction üDevelopment of Neural Network Weight Equations üAlgebra Training Algorithms üExact Matching of Function Input –Output Data ØApproximate Matching of Gradient Data in Algebra Training l. Approximate Matching of Function Input-Output Data l. Exact Matching of Function Gradient Data Function Approximation spring 2006 53
B. Approximate Matching of Gradient Data in Algebra Training l estimate l l l output weights input-to-node values first soluation: l l use randomized W all parameters refined by a p-step node-by-node update algorithm. Function Approximation spring 2006 54
Approximate Matching of Gradient Data in Algebra Training (cont) l d and Function Approximation spring 2006 can be computed solely from 55
Approximate Matching of Gradient Data in Algebra Training (cont) l kith gradient equations solved for the input weights associated with the ith node Function Approximation spring 2006 56
Approximate Matching of Gradient Data in Algebra Training (cont) l end of each step l l terminate l l Solve user-specified gradient tolerance error enters through v and through the input weights error adjusted in later steps basic idea l ith node input weights mainly contribute to the kth partial derivatives Function Approximation spring 2006 57
Function Approximation Using Neural Network üIntroduction üDevelopment of Neural Network Weight Equations üAlgebra Training Algorithms üExact Matching of Function Input –Output Data üApproximate Matching of Gradient Data in Algebra Training ØApproximate Matching of Function Input-Output Data l. Exact Matching of Function Gradient Data Function Approximation spring 2006 58
C. Approximate Matching of Function Input-Output Data l algebraic approach l l l approximate parsimonious network exact sulotion s<p satisfy rank(S|u)= rank(S)= s example l linear system in (9) not square s p inverse relationship between u and v (9) will be overdetermined Function Approximation spring 2006 59
Approximate Matching of Function Input-Output Data (cont) l superimposes technique l l l networks that individually map the nonlinear function over portions of its input space training set, covering entire input space divided into m subsets Function Approximation spring 2006 60
Approximate Matching of Function Input-Output Data (cont) l J Fig. 3. Superposition of one s-node network Function Approximation spring 2006 -node neural networks into 61
Approximate Matching of Function Input-Output Data (cont) l l the gth neural network approximates the vector by the estimate Function Approximation spring 2006 62
Approximate Matching of Function Input-Output Data (cont) l full network l matrix of input-to-node values l with the l Terms l l element in the ith column and kth row main diagonal terms l input-to-node value matrices for m sub-networks off-diagonal terms, l columnwise linearly dependent on the elements in Function Approximation spring 2006 63
Approximate Matching of Function Input-Output Data (cont) l output weights l S constructed to be of rank s rank of = s or s+1 zero or small error during the superposition error does not increase with m l l l Function Approximation spring 2006 64
Approximate Matching of Function Input-Output Data (cont) l key to developing algebraic training techniques l l l construct a matrix S, through N display the desired characteristics l l S must be of rank s s is kept small to produce a parsimonious network. Function Approximation spring 2006 65
Function Approximation Using Neural Network üIntroduction üDevelopment of Neural Network Weight Equations üAlgebra Training Algorithms üExact Matching of Function Input –Output Data üApproximate Matching of Gradient Data in Algebra Training üApproximate Matching of Function Input-Output Data ØExact Matching of Function Gradient Data Function Approximation spring 2006 66
D. Exact Matching of Function Gradient Data l Gradient-based training sets At every training point k l l l is known for e of the neural network inputs denoted by x remaining (q-e) denoted by a Input–output information l Function Approximation spring 2006 & 67
Exact Matching of Function Gradient Data (cont) l input weight l output weight l gradient weight l input-to-node weight equation Function Approximation spring 2006 68
First Linear System(36) l by reorganizing all l s=p => is a known vector rewritten l values -dimensional column f l A is a ps (q-e+1)s matrix l computed from all –input vectors l Function Approximation spring 2006 69
Second Linear System(34) l known l (34) system Becomes linear l l always can be solved for v provided s = p S nonsingular v can be treated as a constant Function Approximation spring 2006 70
Third Linear System(35) l (35) becomes linear unknowns consist of x-input weights l known gradients in training set l X is a known ep es l Function Approximation spring 2006 71
Exact Matching of Function Gradient Data (cont) l algorithm goals l l determines effective distribution for elements weight equations solved in one step first solved strategy l l with probability=1, produce well-conditioned S consists of generating according to Function Approximation spring 2006 72
Input-to-Output Values l Substituted in (38) Function Approximation spring 2006 73
Input-to-Output Values (cont) l l l sigmoids are very nearly centered desirable one sigmoid be centered for a given input prevent ill-conditioning S l l same sigmoid should close to saturation for any other known input need a factor l absolute value of the largest element in Function Approximation spring 2006 74
Exact Matching of Function Gradient Data (cont) l Function Approximation spring 2006 75
Example: Neural Network Modeling of the Sine Function l A sigmoidal neural network is trained to approximate the sine function u=sin(y) over the domain 0≤ y ≤π l The training set is comprised of the gradient and output information shown in the table 1. {yk, uk , ck} k=1, 2, 3 l q=e=1 Function Approximation spring 2006 76
Function Approximation spring 2006 77
Function Approximation spring 2006 78
l It is shown that the data is matched exactly by a network with two nodes l Suppose the input-to-node values and are chosen such that Function Approximation spring 2006 79
Function Approximation spring 2006 80
Function Approximation spring 2006 81
l equations. In this example, is chosen to make the above weight equations consistent and to meet the assumptions in (57) and (60)–(61). It can be easily shown that this corresponds to computing the elements of ( and ) from the equation Function Approximation spring 2006 82
Function Approximation spring 2006 83
Function Approximation spring 2006 84
Function Approximation spring 2006 85
Conclusion l algebraic training vs optimization-based techniques. l l faster execution speeds better generalization properties reduced computational complexity can be used to find a direct correlation between the number of network nodes needed to model a given data set and the desired accuracy of representation. Function Approximation spring 2006 86
Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006 87
- Slides: 87