III Recurrent Neural Networks 11212020 1 A The

  • Slides: 86
Download presentation
III. Recurrent Neural Networks 11/21/2020 1

III. Recurrent Neural Networks 11/21/2020 1

A. The Hopfield Network 11/21/2020 2

A. The Hopfield Network 11/21/2020 2

Typical Artificial Neuron connection weights inputs output threshold 11/21/2020 3

Typical Artificial Neuron connection weights inputs output threshold 11/21/2020 3

Typical Artificial Neuron linear combination activation function net input (local field) 11/21/2020 4

Typical Artificial Neuron linear combination activation function net input (local field) 11/21/2020 4

Equations Net input: New neural state: 11/21/2020 5

Equations Net input: New neural state: 11/21/2020 5

Hopfield Network • • • Symmetric weights: wij = wji No self-action: wii =

Hopfield Network • • • Symmetric weights: wij = wji No self-action: wii = 0 Zero threshold: q = 0 Bipolar states: si {– 1, +1} Discontinuous bipolar activation function: 11/21/2020 6

What to do about h = 0? • There are several options: § §

What to do about h = 0? • There are several options: § § s(0) = +1 s(0) = – 1 or +1 with equal probability hi = 0 no state change (si = si) • Not much difference, but be consistent • Last option is slightly preferable, since symmetric 11/21/2020 7

Positive Coupling • Positive sense (sign) • Large strength 11/21/2020 8

Positive Coupling • Positive sense (sign) • Large strength 11/21/2020 8

Negative Coupling • Negative sense (sign) • Large strength 11/21/2020 9

Negative Coupling • Negative sense (sign) • Large strength 11/21/2020 9

Weak Coupling • Either sense (sign) • Little strength 11/21/2020 10

Weak Coupling • Either sense (sign) • Little strength 11/21/2020 10

State = – 1 & Local Field < 0 h<0 11/21/2020 11

State = – 1 & Local Field < 0 h<0 11/21/2020 11

State = – 1 & Local Field > 0 h>0 11/21/2020 12

State = – 1 & Local Field > 0 h>0 11/21/2020 12

State Reverses h>0 11/21/2020 13

State Reverses h>0 11/21/2020 13

State = +1 & Local Field > 0 h>0 11/21/2020 14

State = +1 & Local Field > 0 h>0 11/21/2020 14

State = +1 & Local Field < 0 h<0 11/21/2020 15

State = +1 & Local Field < 0 h<0 11/21/2020 15

State Reverses h<0 11/21/2020 16

State Reverses h<0 11/21/2020 16

Net. Logo Demonstration of Hopfield State Updating Run Hopfield-update. nlogo 11/21/2020 17

Net. Logo Demonstration of Hopfield State Updating Run Hopfield-update. nlogo 11/21/2020 17

Hopfield Net as Soft Constraint Satisfaction System • States of neurons as yes/no decisions

Hopfield Net as Soft Constraint Satisfaction System • States of neurons as yes/no decisions • Weights represent soft constraints between decisions – hard constraints must be respected – soft constraints have degrees of importance • Decisions change to better respect constraints • Is there an optimal set of decisions that best respects all constraints? 11/21/2020 18

Demonstration of Hopfield Net Dynamics I Run Hopfield-dynamics. nlogo 11/21/2020 19

Demonstration of Hopfield Net Dynamics I Run Hopfield-dynamics. nlogo 11/21/2020 19

Convergence • Does such a system converge to a stable state? • Under what

Convergence • Does such a system converge to a stable state? • Under what conditions does it converge? • There is a sense in which each step relaxes the “tension” in the system • But could a relaxation of one neuron lead to greater tension in other places? 11/21/2020 20

Quantifying “Tension” • If wij > 0, then si and sj want to have

Quantifying “Tension” • If wij > 0, then si and sj want to have the same sign (si sj = +1) • If wij < 0, then si and sj want to have opposite signs (si sj = – 1) • If wij = 0, their signs are independent • Strength of interaction varies with |wij| • Define disharmony (“tension”) Dij between neurons i and j: Dij = – si wij sj Dij < 0 they are happy Dij > 0 they are unhappy 11/21/2020 21

Total Energy of System The “energy” of the system is the total “tension” (disharmony)

Total Energy of System The “energy” of the system is the total “tension” (disharmony) in it: 11/21/2020 22

Review of Some Vector Notation (column vectors) (inner product) (outer product) (quadratic form) 11/21/2020

Review of Some Vector Notation (column vectors) (inner product) (outer product) (quadratic form) 11/21/2020 23

Another View of Energy The energy measures the number of neurons whose states are

Another View of Energy The energy measures the number of neurons whose states are in disharmony with their local fields (i. e. of opposite sign): 11/21/2020 24

Do State Changes Decrease Energy? • Suppose that neuron k changes state • Change

Do State Changes Decrease Energy? • Suppose that neuron k changes state • Change of energy: 11/21/2020 25

Energy Does Not Increase • In each step in which a neuron is considered

Energy Does Not Increase • In each step in which a neuron is considered for update: E{s(t + 1)} – E{s(t)} 0 • Energy cannot increase • Energy decreases if any neuron changes • Must it stop? 11/21/2020 26

Proof of Convergence in Finite Time • There is a minimum possible energy: –

Proof of Convergence in Finite Time • There is a minimum possible energy: – The number of possible states s {– 1, +1}n is finite – Hence Emin = min {E(s) | s { 1}n} exists • Must show it is reached in a finite number of steps 11/21/2020 27

Steps are of a Certain Minimum Size 11/21/2020 28

Steps are of a Certain Minimum Size 11/21/2020 28

Conclusion • If we do asynchronous updating, the Hopfield net must reach a stable,

Conclusion • If we do asynchronous updating, the Hopfield net must reach a stable, minimum energy state in a finite number of updates • This does not imply that it is a global minimum 11/21/2020 29

Lyapunov Functions • A way of showing the convergence of discreteor continuous-time dynamical systems

Lyapunov Functions • A way of showing the convergence of discreteor continuous-time dynamical systems • For discrete-time system: – need a Lyapunov function E (“energy” of the state) – E is bounded below (E{s} > Emin) – DE < (DE)max 0 (energy decreases a certain minimum amount each step) – then the system will converge in finite time • Problem: finding a suitable Lyapunov function 11/21/2020 30

Example Limit Cycle with Synchronous Updating w>0 11/21/2020 w>0 31

Example Limit Cycle with Synchronous Updating w>0 11/21/2020 w>0 31

The Hopfield Energy Function is Even • A function f is odd if f

The Hopfield Energy Function is Even • A function f is odd if f (–x) = – f (x), for all x • A function f is even if f (–x) = f (x), for all x • Observe: 11/21/2020 32

Conceptual Picture of Descent on Energy Surface 11/21/2020 (fig. from Solé & Goodwin) 33

Conceptual Picture of Descent on Energy Surface 11/21/2020 (fig. from Solé & Goodwin) 33

Energy Surface 11/21/2020 (fig. from Haykin Neur. Netw. ) 34

Energy Surface 11/21/2020 (fig. from Haykin Neur. Netw. ) 34

Energy Surface + Flow Lines 11/21/2020 (fig. from Haykin Neur. Netw. ) 35

Energy Surface + Flow Lines 11/21/2020 (fig. from Haykin Neur. Netw. ) 35

Flow Lines Basins of Attraction 11/21/2020 (fig. from Haykin Neur. Netw. ) 36

Flow Lines Basins of Attraction 11/21/2020 (fig. from Haykin Neur. Netw. ) 36

Bipolar State Space 11/21/2020 37

Bipolar State Space 11/21/2020 37

Basins in Bipolar State Space energy decreasing paths 11/21/2020 38

Basins in Bipolar State Space energy decreasing paths 11/21/2020 38

Demonstration of Hopfield Net Dynamics II Run initialized Hopfield. nlogo 11/21/2020 39

Demonstration of Hopfield Net Dynamics II Run initialized Hopfield. nlogo 11/21/2020 39

Storing Memories as Attractors 11/21/2020 (fig. from Solé & Goodwin) 40

Storing Memories as Attractors 11/21/2020 (fig. from Solé & Goodwin) 40

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 41

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 41

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 42

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 42

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 43

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 43

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 44

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 44

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 45

Example of Pattern Restoration 11/21/2020 (fig. from Arbib 1995) 45

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 46

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 46

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 47

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 47

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 48

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 48

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 49

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 49

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 50

Example of Pattern Completion 11/21/2020 (fig. from Arbib 1995) 50

Example of Association 11/21/2020 (fig. from Arbib 1995) 51

Example of Association 11/21/2020 (fig. from Arbib 1995) 51

Example of Association 11/21/2020 (fig. from Arbib 1995) 52

Example of Association 11/21/2020 (fig. from Arbib 1995) 52

Example of Association 11/21/2020 (fig. from Arbib 1995) 53

Example of Association 11/21/2020 (fig. from Arbib 1995) 53

Example of Association 11/21/2020 (fig. from Arbib 1995) 54

Example of Association 11/21/2020 (fig. from Arbib 1995) 54

Example of Association 11/21/2020 (fig. from Arbib 1995) 55

Example of Association 11/21/2020 (fig. from Arbib 1995) 55

Applications of Hopfield Memory • • 11/21/2020 Pattern restoration Pattern completion Pattern generalization Pattern

Applications of Hopfield Memory • • 11/21/2020 Pattern restoration Pattern completion Pattern generalization Pattern association 56

Hopfield Net for Optimization and for Associative Memory • For optimization: – we know

Hopfield Net for Optimization and for Associative Memory • For optimization: – we know the weights (couplings) – we want to know the minima (solutions) • For associative memory: – we know the minima (retrieval states) – we want to know the weights 11/21/2020 57

Hebb’s Rule “When an axon of cell A is near enough to excite a

Hebb’s Rule “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. ” —Donald Hebb (The Organization of Behavior, 1949, p. 62) 11/21/2020 58

Example of Hebbian Learning: Pattern Imprinted 11/21/2020 59

Example of Hebbian Learning: Pattern Imprinted 11/21/2020 59

Example of Hebbian Learning: Partial Pattern Reconstruction 11/21/2020 60

Example of Hebbian Learning: Partial Pattern Reconstruction 11/21/2020 60

Mathematical Model of Hebbian Learning for One Pattern For simplicity, we will include self-coupling:

Mathematical Model of Hebbian Learning for One Pattern For simplicity, we will include self-coupling: 11/21/2020 61

A Single Imprinted Pattern is a Stable State • Suppose W = xx. T

A Single Imprinted Pattern is a Stable State • Suppose W = xx. T • Then h = Wx = xx. Tx = nx since • Hence, if initial state is s = x, then new state is s = sgn (n x) = x • May be other stable states (e. g. , –x) 11/21/2020 62

Questions • How big is the basin of attraction of the imprinted pattern? •

Questions • How big is the basin of attraction of the imprinted pattern? • How many patterns can be imprinted? • Are there unneeded spurious stable states? • These issues will be addressed in the context of multiple imprinted patterns 11/21/2020 63

Imprinting Multiple Patterns • Let x 1, x 2, …, xp be patterns to

Imprinting Multiple Patterns • Let x 1, x 2, …, xp be patterns to be imprinted • Define the sum-of-outer-products matrix: 11/21/2020 64

Definition of Covariance Consider samples (x 1, y 1), (x 2, y 2), …,

Definition of Covariance Consider samples (x 1, y 1), (x 2, y 2), …, (x. N, y. N) 11/21/2020 65

Weights & the Covariance Matrix Sample pattern vectors: x 1, x 2, …, xp

Weights & the Covariance Matrix Sample pattern vectors: x 1, x 2, …, xp Covariance of ith and jth components: 11/21/2020 66

Characteristics of Hopfield Memory • Distributed (“holographic”) – every pattern is stored in every

Characteristics of Hopfield Memory • Distributed (“holographic”) – every pattern is stored in every location (weight) • Robust – correct retrieval in spite of noise or error in patterns – correct operation in spite of considerable weight damage or noise 11/21/2020 67

Demonstration of Hopfield Net Run Malasri Hopfield Demo 11/21/2020 68

Demonstration of Hopfield Net Run Malasri Hopfield Demo 11/21/2020 68

Stability of Imprinted Memories • Suppose the state is one of the imprinted patterns

Stability of Imprinted Memories • Suppose the state is one of the imprinted patterns xm • Then: 11/21/2020 69

Interpretation of Inner Products • xk xm = n if they are identical –

Interpretation of Inner Products • xk xm = n if they are identical – highly correlated • xk xm = –n if they are complementary – highly correlated (reversed) • xk xm = 0 if they are orthogonal – largely uncorrelated • xk xm measures the crosstalk between patterns k and m 11/21/2020 70

Cosines and Inner products u v 11/21/2020 71

Cosines and Inner products u v 11/21/2020 71

Conditions for Stability 11/21/2020 72

Conditions for Stability 11/21/2020 72

Sufficient Conditions for Instability (Case 1) 11/21/2020 73

Sufficient Conditions for Instability (Case 1) 11/21/2020 73

Sufficient Conditions for Instability (Case 2) 11/21/2020 74

Sufficient Conditions for Instability (Case 2) 11/21/2020 74

Sufficient Conditions for Stability The crosstalk with the sought pattern must be sufficiently small

Sufficient Conditions for Stability The crosstalk with the sought pattern must be sufficiently small 11/21/2020 75

Capacity of Hopfield Memory • Depends on the patterns imprinted • If orthogonal, pmax

Capacity of Hopfield Memory • Depends on the patterns imprinted • If orthogonal, pmax = n – but every state is stable trivial basins • So pmax < n • Let load parameter a = p / n 11/21/2020 equations 76

Single Bit Stability Analysis • For simplicity, suppose xk are random • Then xk

Single Bit Stability Analysis • For simplicity, suppose xk are random • Then xk xm are sums of n random 1 § § binomial distribution ≈ Gaussian in range –n, …, +n with mean m = 0 and variance s 2 = n • Probability sum > t: [See “Review of Gaussian (Normal) Distributions” on course website] 11/21/2020 77

Approximation of Probability 11/21/2020 78

Approximation of Probability 11/21/2020 78

Probability of Bit Instability (fig. from Hertz & al. Intr. Theory Neur. Comp. )

Probability of Bit Instability (fig. from Hertz & al. Intr. Theory Neur. Comp. ) 11/21/2020 79

Tabulated Probability of Single-Bit Instability a Perror 11/21/2020 0. 1% 0. 105 0. 36%

Tabulated Probability of Single-Bit Instability a Perror 11/21/2020 0. 1% 0. 105 0. 36% 0. 138 1% 0. 185 5% 0. 37 10% 0. 61 (table from Hertz & al. Intr. Theory Neur. Comp. ) 80

Spurious Attractors • Mixture states: – – – sums or differences of odd numbers

Spurious Attractors • Mixture states: – – – sums or differences of odd numbers of retrieval states number increases combinatorially with p shallower, smaller basins of mixtures swamp basins of retrieval states overload useful as combinatorial generalizations? self-coupling generates spurious attractors • Spin-glass states: – not correlated with any finite number of imprinted patterns – occur beyond overload because weights effectively random 11/21/2020 81

Basins of Mixture States 11/21/2020 82

Basins of Mixture States 11/21/2020 82

Fraction of Unstable Imprints (n = 100) 11/21/2020 (fig from Bar-Yam) 83

Fraction of Unstable Imprints (n = 100) 11/21/2020 (fig from Bar-Yam) 83

Number of Stable Imprints (n = 100) 11/21/2020 (fig from Bar-Yam) 84

Number of Stable Imprints (n = 100) 11/21/2020 (fig from Bar-Yam) 84

Number of Imprints with Basins of Indicated Size (n = 100) 11/21/2020 (fig from

Number of Imprints with Basins of Indicated Size (n = 100) 11/21/2020 (fig from Bar-Yam) 85

Summary of Capacity Results • Absolute limit: pmax < acn = 0. 138 n

Summary of Capacity Results • Absolute limit: pmax < acn = 0. 138 n • If a small number of errors in each pattern permitted: pmax n • If all or most patterns must be recalled perfectly: pmax n / log n • Recall: all this analysis is based on random patterns • Unrealistic, but sometimes can be arranged 11/21/2020 III B 86