Radial Basis Function Networks 1 Introduction 2 Finding

Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison between RBF and BP 11/22/2020 RBF Networks M. W. Mak 1

1. Introduction l l MLPs are highly non-linear in the parameter space gradient descent local minima RBF networks solve this problem by dividing the learning into two independent processes. w 11/22/2020 RBF Networks M. W. Mak 2

l RBF networks implement the function l wi i and ci can be determined separately Fast learning algorithm Basis function types l 11/22/2020 RBF Networks M. W. Mak 3

l For Gaussian basis functions l Assume the variance across each dimension are equal 11/22/2020 RBF Networks M. W. Mak 4

l To write in matrix form, let 11/22/2020 RBF Networks M. W. Mak 5

2. Finding the RBF Parameters l Use the K-mean algorithm to find ci 11/22/2020 RBF Networks M. W. Mak 6

K-mean Algorithm step 1: K initial clusters are chosen randomly from the samples to form K groups. step 2: Each new sample is added to the group whose mean is the closest to this sample. step 3: Adjust the mean of the group to take account of the new points. step 4: Repeat step 2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance. 11/22/2020 RBF Networks M. W. Mak 7

Outcome: There are K clusters with means representing the centroid of each clusters. Advantages: (1) A fast and simple algorithm. (2) Reduce the effects of noisy samples. 11/22/2020 RBF Networks M. W. Mak 8

l Use K nearest neighbor rule to find the function width k-th nearest neighbor of ci l The objective is to cover the training points so that a smooth fit of the training samples can be achieved 11/22/2020 RBF Networks M. W. Mak 9

Centers and widths found by K-means and K-NN 11/22/2020 RBF Networks M. W. Mak 10

l Determining weights w using the least square method where dp is the desired output for pattern p 11/22/2020 RBF Networks M. W. Mak 11

Let E be the total-squared error between the actual output and the target output 11/22/2020 RBF Networks M. W. Mak 12

Note that Problems (1) Susceptible to round-off error. (2) No solution if (3) If in w. is singular. is close to singular, we get very large component 11/22/2020 RBF Networks M. W. Mak 13

Reasons (1) Inaccuracy in forming (2) If A is ill-conditioned, small change in A introduces large change in (3) If ATA is close to singular, dependent columns in ATA exist e. g. two parallel straight lines. y x 11/22/2020 RBF Networks M. W. Mak 14

singular matrix : If the lines are nearly parallel, they intersect each other at i. e. or So, the magnitude of the solution becomes very large; hence overflow will occur. The effect of the large components can be cancelled out if the machine precision is infinite. 11/22/2020 RBF Networks M. W. Mak 15

If the machine precision is finite, we get large error. For example, Finite machine precision => Solution: Singular Value Decomposition 11/22/2020 RBF Networks M. W. Mak 16

l RBF learning process K-means ci A Basis Functions xp ci K-Nearest Neighbor 11/22/2020 Linear Regression w i RBF Networks M. W. Mak 17

l RBF learning by gradient descent we have Apply 11/22/2020 RBF Networks M. W. Mak 18

we have the following update equations 11/22/2020 RBF Networks M. W. Mak 19

Elliptical Basis Function networks : function centers : covariance matrix 11/22/2020 RBF Networks M. W. Mak 20

l K-means and Sample covariance K-means : if Sample covariance : l The EM algorithm 11/22/2020 RBF Networks M. W. Mak 21

EBF Vs. RBF networks RBFN with 4 centers 11/22/2020 RBF Networks EBFN with 4 centers M. W. Mak 22

Elliptical Basis Function Networks EBF Network’s output 11/22/2020 RBF Networks M. W. Mak 23

RBFN for Pattern Classification Hyperplane MLP Kernel function RBF The probability density function (also called conditional density function or likelihood) of the k-th class is defined as 11/22/2020 RBF Networks M. W. Mak 24

• According to Bays’ theorem, the posterior prob. is where P(Ck) is the prior prob. and • It is possible to use a common pool of M basis functions, labeled by an index j, to represent all of the class-conditional densities, i. e. 11/22/2020 RBF Networks M. W. Mak 25

11/22/2020 RBF Networks M. W. Mak 26

11/22/2020 RBF Networks M. W. Mak 27

No bias term Hidden node’s output posterior prob. of the j-th set of features in the input. weight posterior prob. of class membership, given the presence of the j- th set of features. 11/22/2020 RBF Networks M. W. Mak 28

Comparison of RBF and MLP To learn more about NN hardware, see http: //www. particle. kth. se/~lindsey/Hardware. NNWCourse/home. html 11/22/2020 RBF Networks M. W. Mak 29