Instar and Outstar Learning Laws Adapted from lecture

  • Slides: 27
Download presentation
Instar and Outstar Learning Laws Adapted from lecture notes of the course CN 510:

Instar and Outstar Learning Laws Adapted from lecture notes of the course CN 510: Cognitive and Neural Modeling offered in the Department of Cognitive and Neural Systems at Boston University (Instructor: Dr. Anatoli Gorchetchnikov; anatoli@cns. bu. edu) CN 510 Lecture 11

Competitive Learning A standard competitive learning neural network is schematized at right Specific examples

Competitive Learning A standard competitive learning neural network is schematized at right Specific examples include: COMPETITIVE LAYER _ _ • Von der Malsburg (1973) model of visual cortex + COMPETITION ALLOWS ONLY ONE OR A RELATIVELY SMALL NUMBER OF CELLS TO BECOME ACTIVE _ _ ONLY WEIGHTS PROJECTING TO ACTIVE CELLS ARE ALLOWED TO INCREASE • Grossberg (1976) adaptive resonance theory (ART) • Kohonen (1983) selforganizing feature maps (SOFM, SOM) INPUT LAYER Competitive learning networks typically learn in an unsupervised fashion CN 510 Lecture 11

Grossberg (1976) Describes an unsupervised learning scheme that can be used to build “feature

Grossberg (1976) Describes an unsupervised learning scheme that can be used to build “feature detectors” or perform categorization This paper presents many of theoretical bases of adaptive resonance theory (ART) CN 510 Lecture 11

Neural Feature Detectors A neural feature detector is a (typically sensory) neuron that fires

Neural Feature Detectors A neural feature detector is a (typically sensory) neuron that fires preferentially to inputs of a particular type. For example, many cells in visual cortex fire preferentially to lines of particular orientations: Such a cell could be called a line orientation detector – The detector in the figure above is “tuned” to vertically oriented lines The pattern of firing as a function of some stimulus dimension (e. g. , line orientation angle in the above figure) is called the “tuning curve” CN 510 Lecture 11

Tuning vs. Categorizing Although a feature detector cell is “tuned” to a particular orientation,

Tuning vs. Categorizing Although a feature detector cell is “tuned” to a particular orientation, it may or may not fire for other nearby orientations Categorization, on the other hand, implies that a cell fires only for inputs in the category coded by the cell This is closely related to the difference between “choice” and “contrast enhancement” in an RCF: x FASTERTHAN- f(x) LINEAR x SIGMOID f(x) In the competitive learning network of Grossberg (1976), choice leads to a categorization network, and contrast enhancement leads to feature detectors with broader tuning curves CN 510 Lecture 11

The Network Of Grossberg (1976) Network is a standard competitive learning network in which

The Network Of Grossberg (1976) Network is a standard competitive learning network in which the F 2 layer is a shunting recurrent competitive field COMPETITION ALLOWS (RCF): ONLY ONE OR A RELATIVELY COMPETITIVE LAYER _ _ + SMALL NUMBER OF CELLS TO BECOME ACTIVE _ _ ONLY WEIGHTS PROJECTING TO ACTIVE CELLS ARE ALLOWED TO INCREASE INPUT LAYER In this tutorial we will consider the case where F 2 is a “choice” RCF network (i. e. , faster-than-linear signal function) CN 510 Lecture 11

Consider what happens when an input pattern is presented to the F 1 stage

Consider what happens when an input pattern is presented to the F 1 stage of the network _ F 2 Because the RCF at F 2 is winner-take-all, the F 2 cell with the largest input will win the competition All other cells will have their activities quenched F 1 CN 510 Lecture 11 _ _ _

Which F 2 cell will have the largest input? Need to consider the relationship

Which F 2 cell will have the largest input? Need to consider the relationship between the input pattern (vector) and the cell’s weight vector The input to F 2 cell will be a product of input pattern and the weight vector _ F 2 F 1 CN 510 Lecture 11 _ _ _

In vector notation the input Ij to the F 2 cell indexed by j

In vector notation the input Ij to the F 2 cell indexed by j is: Where is the vector of F 1 cell activities, is the vector of weights projecting to F 2 cell j, and is the angle between these two vectors Assume for now that the total activity at F 1 is normalized, as are the weight vectors projecting to the F 2 cells In this case, the F 2 cell with the most input will be the one whose afferent weight vector is closest to parallel to the activity pattern at F 1 and thus has the largest term in the above equation CN 510 Lecture 11

For a 2 -D input space (i. e. , 2 F 1 cells), the

For a 2 -D input space (i. e. , 2 F 1 cells), the situation can be schematized as follows: CN 510 Lecture 11

The Instar Learning Law Grossberg (1976) studied the effects of using an “instar” learning

The Instar Learning Law Grossberg (1976) studied the effects of using an “instar” learning law with Hebbian growth and post-synaptically gated decay in the F 1 to F 2 weights If F 2 is winner-take-all, learning will only occur in the weight vector projecting to the F 2 cell that won the competition; This is Competitive learning CN 510 Lecture 11

What happens to the afferent weight vector for the F 2 cell that won

What happens to the afferent weight vector for the F 2 cell that won the competition? The weight vector becomes more like the current input vector. CN 510 Lecture 11

That is, the afferent weight vector becomes more parallel to the current F 1

That is, the afferent weight vector becomes more parallel to the current F 1 STM pattern This means that the winning F 2 cell will become even more responsive to the current input pattern CN 510 Lecture 11

Weight “Normalization” in the Grossberg (1976) Network If the F 1 layer is a

Weight “Normalization” in the Grossberg (1976) Network If the F 1 layer is a shunting competitive network with normalized activity, then the weight vector projecting to each F 2 cell will be approximately normalized as well after learning has taken place This can be seen by looking at the instar learning law: This can be interpreted as “zij tracks x 1 i at a rate determined by x 2 j”. Thus, each component of the F 2 cell’s weight vector tracks the corresponding component of the input vector at F 1, and since the input vector is normalized, the weight vector becomes approximately normalized as well CN 510 Lecture 11

“Coding” of the Input Space The afferent weight vectors for the F 2 cells

“Coding” of the Input Space The afferent weight vectors for the F 2 cells span the input space When an input arrives at F 1, the F 2 cell whose afferent weight vector is closest to parallel to the input vector will have the most input and win the competition at F 2 Thus, each F 2 cell is tuned to (“codes”) a region of the input space roughly centered around its afferent weight vector: CN 510 Lecture 11

We know that, for a single input pattern, the weight vector afferent to an

We know that, for a single input pattern, the weight vector afferent to an F 2 cell moves toward the input pattern vector More general question: what happens to the network as a whole when presented with many stimulus patterns? In particular, do the “features” (i. e. , regions of input space) coded by the cells constantly change, or do they stabilize? Grossberg (1976) noted that we cannot in general guarantee that: – an initial partition of input space will be maintained through time, or – a “stable” partition will ever form CN 510 Lecture 11

Strange learning sequences might cause the constant recoding of F 2 cell firing preferences:

Strange learning sequences might cause the constant recoding of F 2 cell firing preferences: CN 510 Lecture 11

Sparse Patterns Theorem However, we can guarantee that a partitioning into “sparse” classes (defined

Sparse Patterns Theorem However, we can guarantee that a partitioning into “sparse” classes (defined below) will persist: Sparse patterns theorem Assume that you start out with a “sparse partition”; i. e. , a partition such that the classifying weight vector for a given subset of input vectors is closer to all vectors in the subset than to any other vector from any other subset CN 510 Lecture 11

Then, with instar learning and a choice network at F 2, you are guaranteed

Then, with instar learning and a choice network at F 2, you are guaranteed that the initial partitions will not be recoded, and the weight vectors will eventually enter the convex hull of the subset of input patterns that it codes. CN 510 Lecture 11

Proof Of Convergence to Convex Hull Grossberg (1976) provides an analytical proof We will

Proof Of Convergence to Convex Hull Grossberg (1976) provides an analytical proof We will look at a simpler and more general “geometric” proof here (Guenther, 1992) Key concept: For an acute angle ABC, as x is moved from B toward C along line segment BC, the distance between A and x initially decreases This concept is applied twice in the following proof CN 510 Lecture 11

To prove that the weight vector z will enter the convex hull of training

To prove that the weight vector z will enter the convex hull of training inputs, we can show that the closest point on the convex hull, cmin, keeps getting closer to z during learning Consider the hyperplane H passing through cmin and perpendicular to line segment zcmin CN 510 Lecture 11

First, note that all points in the convex hull must lie on H or

First, note that all points in the convex hull must lie on H or on the side of H opposite z To see that this is the case, note that any point c in the convex hull that might lie on the same side of H as z would form an acute angle zcminc, which means that some points between cmin and c would be closer to z, thus violating the definition of cmin Thus all training inputs lie on the side of H opposite z or on H itself CN 510 Lecture 11

This means that the angle cminzc must be acute Thus incrementing the weight vector

This means that the angle cminzc must be acute Thus incrementing the weight vector z toward any training input c from the convex hull will decrease the distance between z and cmin I. e. , the weight vector constantly moves closer to the convex hull, and once in the hull it can never leave it More generally, constantly moving toward arbitrary points in a convex hull eventually leads to the convex hull CN 510 Lecture 11

Beyond Sparse Patterns In the general case, we cannot rely on inputs falling into

Beyond Sparse Patterns In the general case, we cannot rely on inputs falling into sparse classes How, then, do we guarantee that cells in F 2 won’t constantly be recoded? This issue became the motivation for top-down feedback in ART, with top-down weights adjusted according to the outstar learning law CN 510 Lecture 11

Comparison of Instar and Outstar Learning In the instar, the weights afferent to the

Comparison of Instar and Outstar Learning In the instar, the weights afferent to the postsynaptic cell learn the input pattern at the presynaptic sites when the post-synaptic cell is active In the outstar, the weights projecting away from the pre-synaptic cell learn the pattern at the post-synaptic sites when the pre-synaptic cell is active CN 510 Lecture 11

Similar to the instar case, when an outstar is presented with an equivalence class

Similar to the instar case, when an outstar is presented with an equivalence class of border patterns that form a convex hull in some input space, the weight vector is guaranteed to enter the convex hull if enough training is applied The proof is simply the preceding geometric proof The weights projecting from an F 2 cell in an ART network can thus be thought of as a top-down template, or learned expectation, corresponding to typical input patterns that activate the F 2 cell If the F 2 cell represents a category, this template can be thought of as something akin to a category prototype or average category member CN 510 Lecture 11

Note that this top-down feedback represents a learned expectation that can be compared to

Note that this top-down feedback represents a learned expectation that can be compared to the current input, and if the current input differs “too much” from the learned expectation, the F 2 node that won the competition is reset and a new competition takes place among the remaining nodes in an attempt to find a better category match In ART, learning can only take place after an F 2 node is found whose top-down template matches the input pattern sufficiently well This prevents incorrect “recoding” of a category node’s afferent weights when an input isn’t really a member of the category CN 510 Lecture 11