Learning Volatility and the ACC Tim Behrens FMRIB
Learning, Volatility and the ACC Tim Behrens FMRIB + Psychology, University of Oxford FIL - UCL.
0. 8 B CON 0. 7 Reward History Weight (β) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 -0. 1 -0. 2 Kennerley, et al. , Nature Neuroscience, 2006 i-1 i-2 i-3 i-4 i-5 i-6 Trials Into Past i-7 i-8
0. 8 B CON ACCs 0. 7 Reward History Weight (β) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 -0. 1 -0. 2 Kennerley et al. Nature Neuroscience, 2006 i-1 i-2 i-3 i-4 i-5 i-6 Trials Into Past i-7 i-8
Monkeys will sacrifice food opportunities to look at other monkeys ACCG Rudebeck, et al. Science 2005
Interest in other individuals is reduced after ACC gyrus lesion ACCG Rudebeck, et al. Science 2005
Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the sulcus are • mainly with motor regions: • Primary motor cortex • Premotor cortex • Parietal motor areas • Spinal Cord ACCs has information about our own actions
Anatomy - Differences in connections between ACCs and ACCg. • Connections unique to the gyrus are mainly with regions that process emotional and biological stimuli: • • • Periacqueductal grey hypothalamus STS/STG Insula/Temporal pole connections are stronger to the gyrus ACCg has access to information about other agents.
Anatomy - shared connections between ACCs and ACCg. • Some shared connections • • • Orbitofrontal cortex Amydala Ventral striatum ACCg and ACCs are strongly interconnected Both regions have access to and influence over reward and value processing.
ACC Sulcus and learning about your actions.
0. 8 B CON ACCs 0. 7 Reward History Weight (β) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 -0. 1 -0. 2 Kennerley et al. Nature Neuroscience, 2006 i-1 i-2 i-3 i-4 i-5 i-6 Trials Into Past i-7 i-8
What determines the integration length? 0. 8 CON 0. 7 Reward History Weight (β) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 -0. 1 -0. 2 i-1 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
VOLATILE STABLE Reward probabilities change approximately every 25 trials only after hundreds of trials 0. 8 CON 0. 7 Reward History Weight (β) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 0 -0. 1 -0. 2 i-1 i-2 i-3 i-4 i-5 Trials Into Past i-6 i-7 i-8 Kennerly et al. Nat Neurosci 2006 Sugrue et al. Science 2005
Reinforcement learning • We need to continually re-appraise the value of an action based each new experience. outcome δ prediction (Vt) new prediction α xδ (Vt+1)
Updating beliefs on the basis of new information Vt+1=Vt +( α x δ ) The prediction error is the information available from this event The learning rate is the weight given to the current information 14
The learning rate and the value of information. Vt+1=Vt +( α x δ ) The learning rate should represent the value of the current information for guiding future beliefs.
Relationship with integration length α=0. 01 α=0. 4
37 stable 63 Behrens et al. , Nature Neuroscience, 2007
Vt+1=Vt+α x δ Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
changes in reward estimates occur throughout the task… …as do change in volatility estimates Behrens, Woolrich, Walton, Rushworth, Nature Neuroscience, 2007
Monitor x Volatility Decide Monitor Behrens et al. , Nature Neuroscience, 2007
ACC effect size predicts learning rate across subjects Behrens, Woolrich, Walton &Rushworth Nat Neurosci 2007
ACC Gyrus and learning about your social partners.
Interest in other individuals is reduced after ACC gyrus lesion ACCG Rudebeck et al. Science 2005
Rudebeck et al. , Science, 2006
Learning about other agents 37 63 25 Behrens, Hunt, Woolrich, Rushworth Nature 2008
Sources of information Probability that correct colour is blue Probability that confederate advice is good Value of action information Value of social information Behrens, Hunt, Woolrich, Rushworth Nature 2008
Social information is integrated over time - behaviour
Vt+1=Vt +( α x δ ) Reward Prediction Error Reward - Expectation Effect size Outcome Time Behrens, Hunt, Woolrich, Rushworth Nature 2008
Vt+1=Vt +( α x δ ) Prediction error on a social partner. Lie event -Lie prediction Effect size Outcome Time Behrens, Hunt, Woolrich, Rushworth Nature 2008
Vt+1=Vt +( α x δ ) The value of information and the ACC Value of reward information Value of social information 30
Vt+1=Vt +( α x δ ) Combining Information to drive behaviour
Conclusions • ACC codes a learning signal when information is observed. • This signal predicts the speed of learning. • Learning from our own and others’ actions are processed in parallel in ACCs and ACCg. • The outputs of these parallel learning 32
Acknowledgments • Matthew Rushworth • Mark Woolrich • Laurence Hunt • Mark Walton 33
- Slides: 33