Intro Electrophysiology Modelling Discussion slide 1 59 ActorCritic

  • Slides: 59
Download presentation
Intro Electrophysiology Modelling Discussion slide # 1 / 59 Actor-Critic models: from ventral striatal

Intro Electrophysiology Modelling Discussion slide # 1 / 59 Actor-Critic models: from ventral striatal rewardrelated activity to robotics simulations. Dr. Mehdi Khamassi 1, 2 1 2 LPPA, UMR CNRS 7152, Collège de France, Paris Animat. Lab-LIP 6 / SIMA-ISIR, Université Pierre et Marie Curie, Paris 6

OBJECTIVE Intro Electrophysiology Modelling Discussion slide # 2 / 59 Help to understand how

OBJECTIVE Intro Electrophysiology Modelling Discussion slide # 2 / 59 Help to understand how mammals can adapt their behavior in order to maximize reward obtained from the environment. Help to understand brain mechanisms underlying these cognitive processes.

Intro Electrophysiology Modelling Discussion slide # 3 / 59 OBJECTIVE Challenging goal: different levels

Intro Electrophysiology Modelling Discussion slide # 3 / 59 OBJECTIVE Challenging goal: different levels of decision, different learning processes, different types of representation Pluridisciplinary approach Behavioral Neurophysiology Computational Modelling Autonomous Robotics

Intro Electrophysiology Modelling Discussion slide # 4 / 59 ACTOR-CRITIC MODEL CRITIC ACTOR Learns

Intro Electrophysiology Modelling Discussion slide # 4 / 59 ACTOR-CRITIC MODEL CRITIC ACTOR Learns to Predict reward Select actions • Developed in the AI community (RL) • Explains some reward-seeking behaviors • Resemblance with some part of the brain (dopaminergic neurons & striatum)

Intro Electrophysiology Modelling Discussion slide # 5 / 59 Outline 1. Introduction How does

Intro Electrophysiology Modelling Discussion slide # 5 / 59 Outline 1. Introduction How does an Actor-Critic model work ? 3. Computational modelling An Actor-Critic model in a simulated robot 2. Electrophysiology Reward predictions in the rat ventral striatum 4. Discussion

Intro Electrophysiology Modelling Discussion slide # 6 / 59 The Actor-Critic model Learning from

Intro Electrophysiology Modelling Discussion slide # 6 / 59 The Actor-Critic model Learning from reward actions: reward 5 1 2 3 4 5 Reward

Intro Electrophysiology Modelling Discussion slide # 7 / 59 The Actor-Critic model • Learning

Intro Electrophysiology Modelling Discussion slide # 7 / 59 The Actor-Critic model • Learning from reward actions: reward 5 1 2 4 1 2 3 4 5 Reward reinforcement 3 reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 8 / 59 The Actor-Critic model • Learning

Intro Electrophysiology Modelling Discussion slide # 8 / 59 The Actor-Critic model • Learning from reward prediction: Pt-1 actions: reward 5 1 2 4 2 3 4 5 Reward reinforcement 3 Rescorla and Wagner (1972). 1 reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 9 / 59 The Actor-Critic model • Temporal-Difference

Intro Electrophysiology Modelling Discussion slide # 9 / 59 The Actor-Critic model • Temporal-Difference (TD) learning reward predictions: actions: reward 5 1 2 4 1 Pt 2 3 4 5 Reward reinforcement ȓ 3 Sutton and Barto (1998). Pt-1 reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 10 / 59 The Actor-Critic model • Analogy

Intro Electrophysiology Modelling Discussion slide # 10 / 59 The Actor-Critic model • Analogy with dopaminergic neurons S R +1 Romo & Schultz (1990). Houk et al. (1995); Schultz et al. (1997). reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 11 / 59 The Actor-Critic model Analogy with

Intro Electrophysiology Modelling Discussion slide # 11 / 59 The Actor-Critic model Analogy with dopaminergic neurons S R +1 Romo & Schultz (1990). Houk et al. (1995); Schultz et al. (1997). reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 12 / 59 The Actor-Critic model Analogy with

Intro Electrophysiology Modelling Discussion slide # 12 / 59 The Actor-Critic model Analogy with dopaminergic neurons S R 0 Romo & Schultz (1990). Houk et al. (1995); Schultz et al. (1997). reinforcement reward

Intro Electrophysiology Modelling Discussion slide # 13 / 59 The Actor-Critic model Analogy with

Intro Electrophysiology Modelling Discussion slide # 13 / 59 The Actor-Critic model Analogy with dopaminergic neurons S R -1 Romo & Schultz (1990). Houk et al. (1995); Schultz et al. (1997). reinforcement reward

The Actor-Critic models Dopaminergic neuron Barto (1995); Houk et al. (1995); Montague et al.

The Actor-Critic models Dopaminergic neuron Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002). see Joel et al. (2002) for a review. Intro Electrophysiology Modelling Discussion slide # 14 / 59

Intro Electrophysiology Modelling Discussion slide # 15 / 59 The Actor-Critic models L P=0

Intro Electrophysiology Modelling Discussion slide # 15 / 59 The Actor-Critic models L P=0 P=0 E Dopaminergic neuron r=0 r=1

Intro Electrophysiology Modelling Discussion slide # 16 / 59 The Actor-Critic models 1 L

Intro Electrophysiology Modelling Discussion slide # 16 / 59 The Actor-Critic models 1 L P=0 P=0 P=1 1 E Dopaminergic neuron r=0 r=1

Intro Electrophysiology Modelling Discussion slide # 17 / 59 The Actor-Critic models 1 1

Intro Electrophysiology Modelling Discussion slide # 17 / 59 The Actor-Critic models 1 1 1 L P=1 P=0 P=1 1 E Dopaminergic neuron r=0 r=1

The rat brain Intro Electrophysiology Modelling Discussion slide # 18 / 59 Adapted from

The rat brain Intro Electrophysiology Modelling Discussion slide # 18 / 59 Adapted from Tierney (2006)

The striatum Intro Electrophysiology Modelling Discussion slide # 19 / 59 Adapted from Voorn

The striatum Intro Electrophysiology Modelling Discussion slide # 19 / 59 Adapted from Voorn et al. (2004)

Intro Electrophysiology Modelling Discussion slide # 20 / 59 The striatum CRITIC ACTOR Ventral

Intro Electrophysiology Modelling Discussion slide # 20 / 59 The striatum CRITIC ACTOR Ventral Striatum Dorsal Striatum Actions (Barto, 1995; Houk et al. , 1995; Montague et al. , 1996; Schultz et al. , 1997; Doya et al. , 2002; O’Doherty et al. , 2004) Dopaminergic neurons (VTA / SNc)

Intro Electrophysiology Modelling Discussion slide # 21 / 59 The striatum Learning based on

Intro Electrophysiology Modelling Discussion slide # 21 / 59 The striatum Learning based on reward prediction in VS. . . In the monkey: (Hikosaka et al. , 1989; Hollerman et al. , 1998; Kawagoe et al. , 1998; Hassani et al. , 2001; Cromwell and Schultz, 2003) In the rat: (Carelli et al. , 2000; Daw et al. , 2002; Setlow et al. , 2003; Nicola et al. , 2004; Wilson and Bowman, 2005) . . . on dopamine reinforcements. (Schultz et al. , 1992; Satoh et al. , 2003; Nakahara et al. , 2004) . . . modelled by Temporal Difference (TD)-learning (Barto, 1995; Houk et al. , 1995; Schultz et al. , 1997; Doya et al. , 2002)

Intro Electrophysiology Modelling Discussion slide # 22 / 59 The striatum . . .

Intro Electrophysiology Modelling Discussion slide # 22 / 59 The striatum . . . using precise timing reward prediction in TD-learning (Montague et al. , 1996; Suri and Schultz, 2001; Perez-Uribe, 2001; Alexander and Sporns, 2002) simulation of a TD-learning model activity recorded from the monkey striatum Adapted from (Suri and Schultz, 2001)

Electrophysiology Methods Recording in the rat VS Simple electrodes Intro Electrophysiology Modelling Discussion slide

Electrophysiology Methods Recording in the rat VS Simple electrodes Intro Electrophysiology Modelling Discussion slide # 23 / 59

Electrophysiology Behavioral methods The plus-maze task Intro Electrophysiology Modelling Discussion slide # 24 /

Electrophysiology Behavioral methods The plus-maze task Intro Electrophysiology Modelling Discussion slide # 24 / 59

Intro Electrophysiology Modelling Discussion slide # 25 / 59 Electrophysiology Behavioral methods The plus-maze

Intro Electrophysiology Modelling Discussion slide # 25 / 59 Electrophysiology Behavioral methods The plus-maze task Box arrival Center departure Time running immobile

Intro Electrophysiology Modelling Discussion slide # 26 / 59 Electrophysiology Results 170 neurons 91

Intro Electrophysiology Modelling Discussion slide # 26 / 59 Electrophysiology Results 170 neurons 91 neurons with behavioral correlates Departure Center Arrival 5 Time

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 27 / 59 Ventral

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 27 / 59 Ventral striatal neuron. Activity anticipating each reward droplet. Independent from locomotor behavior. Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 28 / 59 Ventral

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 28 / 59 Ventral striatal neuron. Activity anticipating each reward droplet. Independent from locomotor behavior. Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 29 / 59 Ventral

Electrophysiology Results: Reward anticipation Intro Electrophysiology Modelling Discussion slide # 29 / 59 Ventral striatal neuron. Activity anticipating each reward droplet. Independent from locomotor behavior. Anticipation of an extra reward. Khamassi, Mulder et al. (in revision) J Neurophysiol.

Intro Electrophysiology Modelling Discussion slide # 30 / 59 Modelling with TD-learning Results 7

Intro Electrophysiology Modelling Discussion slide # 30 / 59 Modelling with TD-learning Results 7 droplets Temporal representation of stimuli (Montague et al. , 1996). TD-learning Incomplete temporal representation TD-learning Ambiguous visual input TD-learning No spatial information TD-learning 5 3 1

Intro Electrophysiology Modelling Discussion slide # 31 / 59 Modelling with TD-learning Results 7

Intro Electrophysiology Modelling Discussion slide # 31 / 59 Modelling with TD-learning Results 7 droplets Temporal representation of stimuli (Montague et al. , 1996). TD-learning Incomplete temporal representation TD-learning Same context after last drop than during droplets delivery. TD-learning No spatial information TD-learning 5 3 1

Intro Electrophysiology Modelling Discussion slide # 32 / 59 Modelling with TD-learning Results 7

Intro Electrophysiology Modelling Discussion slide # 32 / 59 Modelling with TD-learning Results 7 droplets Temporal representation of stimuli (Montague et al. , 1996). TD-learning Incomplete temporal representation TD-learning Ambiguous visual input TD-learning No spatial information TD-learning 5 3 1

Intro Electrophysiology Modelling Discussion slide # 33 / 59 Modelling with TD-learning Results 7

Intro Electrophysiology Modelling Discussion slide # 33 / 59 Modelling with TD-learning Results 7 droplets Temporal representation of stimuli (Montague et al. , 1996). TD-learning Incomplete temporal representation TD-learning Ambiguous visual input TD-learning No spatial information TD-learning 5 3 1

Intro Electrophysiology Modelling Discussion slide # 34 / 59 TD-learning could reproduce neural anticipatory

Intro Electrophysiology Modelling Discussion slide # 34 / 59 TD-learning could reproduce neural anticipatory activity. Khamassi, Mulder et al. (in revision) J Neurophysiol. Can it reproduce the rat's locomotor behavior in the same task ?

Intro Electrophysiology Modelling Discussion slide # 35 / 59 Autonomous robotics Methods Virtual plus-maze

Intro Electrophysiology Modelling Discussion slide # 35 / 59 Autonomous robotics Methods Virtual plus-maze Visual perceptions reward Actions reward

Intro Electrophysiology Modelling Discussion slide # 36 / 59 Autonomous robotics Methods Virtual plus-maze

Intro Electrophysiology Modelling Discussion slide # 36 / 59 Autonomous robotics Methods Virtual plus-maze Visual perceptions reward 5 1 3 2 4 2 1 3 Actions 4 5 reward

Autonomous robotics Methods Results expected reward 5 1 2 3 4 Intro Electrophysiology Modelling

Autonomous robotics Methods Results expected reward 5 1 2 3 4 Intro Electrophysiology Modelling Discussion slide # 37 / 59

Intro Electrophysiology Modelling Discussion slide # 38 / 59 Autonomous robotics Methods Actor-Critic models

Intro Electrophysiology Modelling Discussion slide # 38 / 59 Autonomous robotics Methods Actor-Critic models Dopaminergic neuron Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002). see Joel et al. (2002) for a review. Simplistic Actor. Most often: discrete environments.

Intro Electrophysiology Modelling Discussion slide # 39 / 59 Autonomous robotics Methods Actor-Critic models

Intro Electrophysiology Modelling Discussion slide # 39 / 59 Autonomous robotics Methods Actor-Critic models Simplistic Actor. Most often: discrete environments. Continuous environments: coordination of modules. Dopaminergic neuron Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002). see Joel et al. (2002) for a review. gating network: Baldassarre (2002); Doya et al. (2002). hand-tuned (independent from modules' performances): Suri and Schultz (2001).

Intro Electrophysiology Modelling Discussion slide # 40 / 59 Autonomous robotics Methods Actor-Critic models

Intro Electrophysiology Modelling Discussion slide # 40 / 59 Autonomous robotics Methods Actor-Critic models Simplistic Actor. Most often: discrete environments. Continuous environments: coordination of modules. gating network: Baldassarre (2002); Doya et al. (2002). hand-tuned (independent from modules' performances): Suri and Schultz (2001). Dopaminergic neuron Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002). see Joel et al. (2002) for a review. Test principles within a common framework

Autonomous robotics Methods Implemented framework Intro Electrophysiology Modelling Discussion slide # 41 / 59

Autonomous robotics Methods Implemented framework Intro Electrophysiology Modelling Discussion slide # 41 / 59

Autonomous robotics Methods Intro Electrophysiology Modelling Discussion slide # 42 / 59 Gurney, Prescott

Autonomous robotics Methods Intro Electrophysiology Modelling Discussion slide # 42 / 59 Gurney, Prescott & Redgrave. (2001) Adapted by Girard et al. (2002; 2003).

Autonomous robotics Methods Intro Electrophysiology Modelling Discussion slide # 43 / 59 module coordination

Autonomous robotics Methods Intro Electrophysiology Modelling Discussion slide # 43 / 59 module coordination

Autonomous robotics Methods 1. gating network (tests modules' capacity for state prediction) Intro Electrophysiology

Autonomous robotics Methods 1. gating network (tests modules' capacity for state prediction) Intro Electrophysiology Modelling Discussion slide # 44 / 59

Intro Electrophysiology Modelling Discussion slide # 45 / 59 Autonomous robotics Methods 2. hand-tuned

Intro Electrophysiology Modelling Discussion slide # 45 / 59 Autonomous robotics Methods 2. hand-tuned (independent from modules' performance) Visual perceptions Categorization reward

Autonomous robotics Methods 3. unsupervised categorization (Self-Oganizing Maps) Intro Electrophysiology Modelling Discussion slide #

Autonomous robotics Methods 3. unsupervised categorization (Self-Oganizing Maps) Intro Electrophysiology Modelling Discussion slide # 46 / 59

Autonomous robotics Methods 4. random robot Intro Electrophysiology Modelling Discussion slide # 47 /

Autonomous robotics Methods 4. random robot Intro Electrophysiology Modelling Discussion slide # 47 / 59

Intro Electrophysiology Modelling Discussion slide # 48 / 59 Autonomous robotics Results average

Intro Electrophysiology Modelling Discussion slide # 48 / 59 Autonomous robotics Results average

Autonomous robotics Results Intro Electrophysiology Modelling Discussion slide # 49 / 59 Nb of

Autonomous robotics Results Intro Electrophysiology Modelling Discussion slide # 49 / 59 Nb of iterations required (Average performance during the second half of the experiment) 1. gating network 2. hand-tuned 3. unsupervised categorization (SOM) 4. random robot 3, 500 94 404 30, 000

Autonomous robotics Results Intro Electrophysiology Modelling Discussion slide # 50 / 59 Nb of

Autonomous robotics Results Intro Electrophysiology Modelling Discussion slide # 50 / 59 Nb of iterations required (Average performance during the second half of the experiment) 1. gating network 2. hand-tuned 3. unsupervised categorization (SOM) 4. random robot 3, 500 94 404 30, 000

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM Intro Electrophysiology Modelling Discussion slide # 51 / 59

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final drop Intro Electrophysiology Modelling Discussion slide # 52 / 59

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM

Discussion Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final drop Perspectives Vary intervals between droplet rewards Integrate action values (Samejima et al. , 2005) Improve the model based on other robotics multimodules reinforcement learning methods (Uchibe et al. , 2004; Brunskill et al. ; 2006) Intro Electrophysiology Modelling Discussion slide # 53 / 59

Intro Electrophysiology Modelling Discussion slide # 54 / 59 The Actor-Critic models 1 1

Intro Electrophysiology Modelling Discussion slide # 54 / 59 The Actor-Critic models 1 1 1 L P=1 P=0 P=1 1 E Dopaminergic neuron r=0 r=1

Intro Electrophysiology Modelling Discussion slide # 55 / 59 Model-based reinforcement learning P=1 P=0

Intro Electrophysiology Modelling Discussion slide # 55 / 59 Model-based reinforcement learning P=1 P=0 P=1 r=0 r=1

Intro Electrophysiology Modelling Discussion slide # 56 / 59 General discussion Model-free Model-based Action

Intro Electrophysiology Modelling Discussion slide # 56 / 59 General discussion Model-free Model-based Action selection process inflexible, slow to acquire flexible, rapidly learned (Stimulus-Response associations) (cognitive map) Strategy dimension (Action-outcome contingencies) Place recognition-triggered response Trullier et al. (1997) Place strategy Cue-guided strategy Visual Cue-guided strategy Dickinson and Balleine (1998) Daw et al. (2005)

General discussion Intro Electrophysiology Modelling Discussion slide # 57 / 59 Reinterpret inconsistent behavioral

General discussion Intro Electrophysiology Modelling Discussion slide # 57 / 59 Reinterpret inconsistent behavioral results spatial more rapidly acquired than cue-guided (Packard and Mc. Gaugh, 1996) cue-guided more rapidly acquired than spatial (Pych et al. , 2005). Evidence for involvement of the prefronto-striatal system in model-based strategies In m. PFC: A-O contingencies (Mulder et al. , 2003), spatial goals (Hok et al. , 2005) Lesions of the striatum impair model-based strategies (Kelley et al. , 1997; Corbit et al. , 2001; Yin et al. , 2005)

Intro Electrophysiology Modelling Discussion slide # 58 / 59 Perspective EC Project ICEA (Integrating

Intro Electrophysiology Modelling Discussion slide # 58 / 59 Perspective EC Project ICEA (Integrating Cognition, Emotion and Autonomy) Klusters software (c) L. Hazan in Buzśaki’s lab Autonomous robotics, LIP 6/ISIR Neurophysiological experiments, LPPA Bioinspired interfaces for assessing new hypotheses Webots software, (c) Wany Robotics

Collaborators Thesis advisors: Agnès Guillot Sidney I. Wiener LPPA Collège de France: Alain Berthoz

Collaborators Thesis advisors: Agnès Guillot Sidney I. Wiener LPPA Collège de France: Alain Berthoz Benoît Girard Adrien Peyrache Karim Benchenane IDIAP Research Institute: Ricardo Chavarriaga Intro Electrophysiology Modelling Discussion slide # 59 / 59 ISIR, Université Paris 6: Jean-Arcady Meyer Laurent Dollé Louis-Emmanuel Martinet Olivier Sigaud Universiteit van Amsterdam: Francesco P. Battaglia Antonius B. Mulder Toyama Faculty of Food nutrition: Eichi Tabuchi