A Clustered Particle Swarm Algorithm for Retrieving all

  • Slides: 33
Download presentation
A Clustered Particle Swarm Algorithm for Retrieving all the Local Minima of a function

A Clustered Particle Swarm Algorithm for Retrieving all the Local Minima of a function C. Voglis & I. E. Lagaris Computer Science Department University of Ioannina, GREECE

Presentation Outline Global Optimization Problem n Particle Swarm Optimization n Clustering Approach n n

Presentation Outline Global Optimization Problem n Particle Swarm Optimization n Clustering Approach n n Modifying Particle Swarm to form clusters Modifying the affinity matrix Putting the pieces together Determining the number of minima n Identification of the clusters n n Preliminary results – Future research

Global Optimization n The goal is to find the Global minimum inside a bounded

Global Optimization n The goal is to find the Global minimum inside a bounded domain: n One way to do that, is to find all the local minima and choose among them the global one (or ones). Popular methods of that kind are Multistart, MLSL, TMLSL*, etc. n *M. Ali

Particle Swarm Optimization q Developed in 1995 by James Kennedy and Russ Eberhart. q

Particle Swarm Optimization q Developed in 1995 by James Kennedy and Russ Eberhart. q It was inspired by social behavior of bird flocking or fish schooling. q PSO applies the concept of social interaction to problem solving. q Finds a global optimum.

PSO-Description The method allows the motion of particles to explore the space of interest.

PSO-Description The method allows the motion of particles to explore the space of interest. n Each particle updates its position in discrete unit time steps. n The velocity is updated by a linear combination of two terms: n The first along the direction pointing to the best position discovered by the particle n The second towards the overall best position. n

PSO - Relations Where: is the position of the ith Swarm’s best position particle

PSO - Relations Where: is the position of the ith Swarm’s best position particle at step k is its velocity is the best position visited by the ith particle is the overall best position ever visited is the constriction factor Particle’s best position

PS+Clustering Optimization If the global component is weakened the swarm is expected to form

PS+Clustering Optimization If the global component is weakened the swarm is expected to form clusters around the minima. n If a bias is added towards the steepest descent direction, this will be accelerated. n Locating the minima then may be tackled, to a large extend, as a Clustering Problem (CP). n However is not a regular CP, since it can benefit from information supplied by the objective function. n

Modified PSO Global component is set to zero. n A component pointing towards the

Modified PSO Global component is set to zero. n A component pointing towards the steepest descent direction* is added to accelerate the process. n So the swarm motion is described by: n *A. Ismael F. Vaz, M. G. P. Fernantes

Modified PSO movie

Modified PSO movie

Clustering problem: “Partition a data set into M disjoint subsets containing points with one

Clustering problem: “Partition a data set into M disjoint subsets containing points with one or more properties in common” n A commonly used property refers to topographical grouping based on distances. n Plethora of Algorithms: n n K-Means, Hierarchical -Single linkage-Quantum. Newtonian clustering.

Global k-means n Minimize the clustering error n It is an incremental procedure using

Global k-means n Minimize the clustering error n It is an incremental procedure using the k-Means algorithm repeatedly Independent of the initialization choice. Has been successfully applied to many problems. n n A. Likas

Global K-Means movie

Global K-Means movie

Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the data

Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the data n Obtain data representation in the lowdimensional space that can be easily clustered n Variety of methods that use the eigenvectors differently n Useful information can be extracted from the eigenvalues n

The Affinity Matrix This symmetric matrix is of key importance. Each off-diagonal element is

The Affinity Matrix This symmetric matrix is of key importance. Each off-diagonal element is given by:

The Affinity Matrix n Let The Matrix and for is diagonalized and let be

The Affinity Matrix n Let The Matrix and for is diagonalized and let be its eigenvalues sorted in descending order. The gap which is biggest, identifies the number of clusters (k).

Simple example n Subset of Cisi/Medline dataset n n n Two clusters: IR abstracts,

Simple example n Subset of Cisi/Medline dataset n n n Two clusters: IR abstracts, Medical abstracts 650 documents, 3366 terms after pre-processing Spectral embedded space based constructed from two largest eigenvectors:

How to select k? n n Eigengap: the difference between two consecutive eigenvalues. Most

How to select k? n n Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression Largest eigengap Þ Choose k=2 λ 1 λ 2

Putting the pieces together 1. 2. Apply modified particle swarm to form clusters around

Putting the pieces together 1. 2. Apply modified particle swarm to form clusters around the minima Construct the affinity matrix A and compute the eigenvalues of M. A. B. 3. 4. Use only distance information Add gradient information Find the largest eigengap and identify k. Perform global k-means using the determined k A. B. Use pairwise distances and centroids Use affinity matrix and medoids (with gradient

Adding information to Affinity matrix n n n Use the gradient vectors to zero

Adding information to Affinity matrix n n n Use the gradient vectors to zero out pairwise affinities. New formula : Do not associate particles that would become more distant if they would follow the negative gradient.

Adding information to Affinity matrix Black arrow: Green arrows: Red arrows: Gradient of particle

Adding information to Affinity matrix Black arrow: Green arrows: Red arrows: Gradient of particle i Gradient of j with non zero affinity to i Gradient of j with zero affinity to i

From global k-means to global k-medoids n Original global k-means

From global k-means to global k-medoids n Original global k-means

Rastrigin function (49 minima) After modified particle Swarm Gradient information

Rastrigin function (49 minima) After modified particle Swarm Gradient information

Rastrigin function Estimation of k using distance Estimation of k using gradient info

Rastrigin function Estimation of k using distance Estimation of k using gradient info

Rastrigin function Global k-means

Rastrigin function Global k-means

Rastrigin function Global k-medoids

Rastrigin function Global k-medoids

Shubert function (100 minima) After modified particle Swarm Gradient information

Shubert function (100 minima) After modified particle Swarm Gradient information

Shubert function Estimation of k using distance Estimation of k using gradient info

Shubert function Estimation of k using distance Estimation of k using gradient info

Shubert function Global k-means

Shubert function Global k-means

Shubert function Global k-medoids

Shubert function Global k-medoids

Ackley function (25 minima) After modified particle Swarm Gradient information

Ackley function (25 minima) After modified particle Swarm Gradient information

Shubert function Estimation of k using distance Estimation of k using gradient info

Shubert function Estimation of k using distance Estimation of k using gradient info

Shubert function Global k-means

Shubert function Global k-means

Shubert function Global k-medoids

Shubert function Global k-medoids