Analyzing iterated learning Tom Griffiths Mike Kalish Brown
Analyzing iterated learning Tom Griffiths Mike Kalish Brown University of Louisiana
Cultural transmission • Most knowledge is based on secondhand data • Some things can only be learned from others – cultural objects transmitted across generations • Studying the cognitive aspects of cultural transmission provides unique insights…
Iterated learning (Kirby, 2001) • Each learner sees data, forms a hypothesis, produces the data given to the next learner • c. f. the playground game “telephone”
Objects of iterated learning • It’s not just about languages… • In the wild: – religious concepts – social norms – myths and legends – causal theories • In the lab: – functions and categories
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
Discrete generations of single learners PL(h|d) PP(d|h) PL(h|d): probability of inferring hypothesis h from data d PP(d|h): probability of generating data d from hypothesis h
Markov chains x x x x Transition matrix T = P(x(t+1)|x(t)) • Variables x(t+1) independent of history given x(t) • Converges to a stationary distribution under easily checked conditions for ergodicity
Stationary distributions • Stationary distribution: • In matrix form • is the first eigenvector of the matrix T • Second eigenvalue sets rate of convergence
Analyzing iterated learning d 0 PL(h|d) h 1 PP(d|h) d 1 PL(h|d) h 2 PP(d|h) d 2 PL(h|d) h 3 A Markov chain on hypotheses h 1 d PP(d|h)PL(h|d) h 2 d PP(d|h)PL(h|d) h 3 A Markov chain on data d 0 h PL(h|d) PP(d|h) d 1 h PL(h|d) PP(d|h) d 2 h PL(h|d) PP A Markov chain on hypothesis-data pairs h 1, d 1 PL(h|d) PP(d|h) h 2 , d 2 PL(h|d) PP(d|h) h 3 , d 3
A Markov chain on hypotheses • Transition probabilities sum out data • Stationary distribution and convergence rate from eigenvectors and eigenvalues of Q – can be computed numerically for matrices of reasonable size, and analytically in some cases
Infinite populations in continuous time • “Language dynamical equation” (Nowak, Komarova, & Niyogi, 2001) • “Neutral model” (fj(x) constant) (Komarova & Nowak, 2003) • Stable equilibrium at first eigenvector of Q
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
Bayesian inference • Rational procedure for updating beliefs • Foundation of many learning algorithms (e. g. , Mackay, 2003) • Widely used for language learning (e. g. , Charniak, 1993) Reverend Thomas Bayes
Bayes’ theorem Posterior probability h: hypothesis d: data Likelihood Prior probability Sum over space of hypotheses
Iterated Bayesian learning Learners are Bayesian agents
Markov chains on h and d • Markov chain on h has stationary distribution the prior • Markov chain on d has stationary distribution the prior predictive distribution
Markov chain Monte Carlo • A strategy for sampling from complex probability distributions • Key idea: construct a Markov chain which converges to a particular distribution – e. g. Metropolis algorithm – e. g. Gibbs sampling
Gibbs sampling For variables x = x 1, x 2, …, xn Draw xi(t+1) from P(xi|x-i) x-i = x 1(t+1), x 2(t+1), …, xi-1(t+1), xi+1(t), …, xn(t) Converges to P(x 1, x 2, …, xn) (Geman & Geman, 1984) (a. k. a. the heat bath algorithm in statistical physics)
Gibbs sampling (Mac. Kay, 2003)
Iterated learning is a Gibbs sampler • Iterated Bayesian learning is a sampler for • Implies: – (h, d) converges to this distribution – converence rates are known (Liu, Wong, & Kong, 1995)
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
An example: Gaussians • If we assume… – data, d, is a single real number, x – hypotheses, h, are means of a Gaussian, – prior, p( ), is Gaussian( 0, 02) • …then p(xn+1|xn) is Gaussian( n, x 2 + n 2)
An example: Gaussians • If we assume… – data, d, is a single real number, x – hypotheses, h, are means of a Gaussian, – prior, p( ), is Gaussian( 0, 02) • …then p(xn+1|xn) is Gaussian( n, x 2 + n 2) • p(xn|x 0) is Gaussian( 0+cnx 0, ( x 2 + 02)(1 - c 2 n)) i. e. geometric convergence to prior
An example: Gaussians • p(xn+1|x 0) is Gaussian( 0+cnx 0, ( x 2 + 02)(1 -c 2 n))
0 = 0, 02 = 1, x 0 = 20 Iterated learning results in rapid convergence to prior
An example: Linear regression • Assume – data, d, are pairs of real numbers (x, y) – hypotheses, h, are functions • An example: linear regression – hypotheses have slope and pass through origin – p( ) is Gaussian( 0, 02) y } x=1
y 0 = 1, 02 = 0. 1, y 0 = -1 } x=1
An example: compositionality “agents” 0 “actions” 1 compositional “nouns” 0 1 0 0 1 1 events x language function utterances y “verbs”
An example: compositionality 0 1 compositional 0 P(h) 1 0 0 1 1 0 1 holistic 0 1 0 0 1 1 • Data: m event-utterance pairs • Hypotheses: languages, with error
Analysis technique 1. Compute transition matrix on languages 2. Sample Markov chains 3. Compare language frequencies with prior (can also compute eigenvalues etc. )
Convergence to priors Effect of Prior = 0. 50, = 0. 05, m = 3 = 0. 01, = 0. 05, m = 3 Iteration Chain Prior
The information bottleneck No effect of bottleneck = 0. 50, = 0. 05, m = 1 = 0. 01, = 0. 05, m = 3 = 0. 50, = 0. 05, m = 10 Iteration Chain Prior
The information bottleneck Bottleneck affects relative stability of languages favored by prior
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
A method for discovering priors Iterated learning converges to the prior… …evaluate prior by producing iterated learning
Iterated function learning data hypotheses • Each learner sees a set of (x, y) pairs • Makes predictions of y for new x values • Predictions are data for the next learner
Function learning in the lab Stimulus Feedback Response Slider Examine iterated learning with different initial data
Initial data Iteration 1 2 3 4 5 6 7 8 9 (Kalish, 2004)
Outline 1. Analyzing iterated learning 2. Iterated Bayesian learning 3. Examples 4. Iterated learning with humans 5. Conclusions and open questions
Conclusions and open questions • Iterated Bayesian learning converges to prior – properties of languages are properties of learners – information bottleneck doesn’t affect equilibrium • What about other learning algorithms? • What determines rates of convergence? – amount and structure of input data • What happens with people?
- Slides: 42