Variational filtering and DEM EPSRC Symposium Workshop on

Overview Hierarchical dynamic models Generalised coordinates (dynamical priors) Hierarchal forms (structural priors) Variational filtering

Generalised coordinates Likelihood (Dynamical) prior

Energies and generalised precisions Instantaneous energy General and Gaussian forms Precision matrices in generalised

Hierarchal forms and empirical priors A simple energy function of prediction error Dynamical priors

Variational learning Aim: To optimise the path-integral (Action) of a free-energy bound on model

Mean-field approximation Lemma : The free energy is maximised with respect to when Variational

Ensemble learning Lemma: is the stationary solution, in a moving frame of reference, for

Optimizing free-energy under the Laplace approximation Mean-field approximation: Laplace approximation: The Laplace approximation enables

Approximating the mode … a gradient ascent in moving coordinates Taking the expectation of

Dynamic expectation maximization D-Step inference local linearisation (Ozaki 1992) E-Step learning M-Step uncertainty A

A linear convolution model Prediction error Generation Inversion

Variational filtering on states and causes hidden states 1. 5 1 0. 5 0

Linear deconvolution with variational filtering (SDE) – free form Linear deconvolution with Dynamic expectation

The order of generalised motion 2 sum squared error (causal states) Precision in generalised

hidden states DEM and extended Kalman filtering 1 0. 5 0 With convergence when

A nonlinear convolution model level This system has a slow sinusoidal input or cause

DEM and particle filtering Sum of squared error Comparative performance

Inference on states Triple estimation (DEM) Learning parameters

An f. MRI study of attention Stimuli 250 radially moving dots at 4. 7

A hemodynamic model Visual input Motion Attention convolution kernel state equations Output: a mixture

Inference on states Hemodynamic deconvolution (V 5) Learning parameters

Synthetic song-birds syrinx hierarchy of Lorenz attractors

Summary Hierarchical dynamic models Generalised coordinates (dynamical priors) Hierarchal forms (structural priors) Variational filtering

Slides: 34

Download presentation

Variational filtering and DEM EPSRC Symposium Workshop on Computational Neuroscience Monday 8 – Thursday 11, December 2008 Abstract This presentation reviews variational treatments of dynamic models that furnish time-dependent conditional densities on the path or trajectory of a system's states and the time-independent densities of its parameters. These obtain by maximizing a variational action with respect to conditional densities. The action or path-integral of free-energy represents a lower-bound on the model’s log-evidence or marginal likelihood required for model selection and averaging. This approach rests on formulating the optimization in generalized coordinates of motion. The resulting scheme can be used for online Bayesian inversion of nonlinear hierarchical dynamic causal models and is shown to outperform existing approaches, such as Kalman and particle filtering. Furthermore, it provides for multiple inference on a models states, parameters and hyperparameters using exactly the same principles. Free-form (Variational filtering) and fixed form (Dynamic Expectation Maximization) variants of the scheme will be demonstrated using simulated (bird-song) and real data (from hemodynamic systems studied in neuroimaging).

Overview Hierarchical dynamic models Generalised coordinates (dynamical priors) Hierarchal forms (structural priors) Variational filtering and action (free-form) Laplace approximation and DEM (fixedform) Comparative evaluations Examples (Hemodynamics and Bird songs)

Generalised coordinates Likelihood (Dynamical) prior

Energies and generalised precisions Instantaneous energy General and Gaussian forms Precision matrices in generalised coordinates and time

Hierarchical dynamic models

Hierarchal forms and empirical priors A simple energy function of prediction error Dynamical priors (empirical) Structural priors (empirical) Priors (full)

Variational learning Aim: To optimise the path-integral (Action) of a free-energy bound on model evidence w. r. t. a recognition density q Free-energy: Expected energy: Entropy: When optimised, the recognition density approximates the true conditional density and Action becomes a bound approximation to the integrated log-evidence; these can then be used for inference on parameters and model space respectively

Mean-field approximation Lemma : The free energy is maximised with respect to when Variational energy and actions Recognition density Where and are the prior energies and the instantaneous energy is specified by a generative model We now seek recognition densities that maximise action

Ensemble learning Lemma: is the stationary solution, in a moving frame of reference, for an ensemble of particles, whose equations of motion and ensemble dynamics are Variational filtering Proof: Substituting the recognition density gives This describes a stationary density under a moving frame of reference, with velocity as seen using the co-ordinate transform

A toy example 5 4 3 2 5 1 0 0 -5 2 -1 0 -2 0 20 40 60 80 100 120 -2

Optimizing free-energy under the Laplace approximation Mean-field approximation: Laplace approximation: The Laplace approximation enables us the specify the sufficient statistics of the recognition density ve Conditional modes Conditional precisions Under these approximations, all we need to do is optimise the conditional modes

Approximating the mode … a gradient ascent in moving coordinates Taking the expectation of the ensemble dynamics, we get: Here, can be regarded as a gradient ascent in a frame of reference that moves along the trajectory encoded in generalised coordinates. The stationary solution, in this moving frame of reference, maximises variational action. by the Fundamental lemma; c. f. , Hamilton's principle of stationary action.

Dynamic expectation maximization D-Step inference local linearisation (Ozaki 1992) E-Step learning M-Step uncertainty A dynamic recognition system that minimises prediction error

A linear convolution model Prediction error Generation Inversion

Variational filtering on states and causes hidden states 1. 5 1 0. 5 0 cause -0. 5 -1 5 10 15 20 25 30 cause 1. 2 1 0. 8 time 0. 6 0. 4 0. 2 0 -0. 2 -0. 4 5 10 15 20 time {bins} 25 30

Linear deconvolution with variational filtering (SDE) – free form Linear deconvolution with Dynamic expectation maximisation (ODE) – fixed form

The order of generalised motion 2 sum squared error (causal states) Precision in generalised coordinates 1. 5 1 0. 5 6 0 5 -0. 5 -1 Accuracy and embedding (n) 4 3 2 -1. 5 0 10 20 30 40 time 2 1. 5 1 1 0 1 3 5 7 9 11 13 0. 5 0 -0. 5 -1 -1. 5 0 5 10 15 20 time 25 30 35

hidden states DEM and extended Kalman filtering 1 0. 5 0 With convergence when DEM(0) -0. 5 DEM(4) EKF -1 hidden states true 1 -1. 5 0 5 10 15 20 25 30 35 time sum of squared error (hidden states) 0. 5 0 0. 9 DEM(0) -0. 5 0. 8 EKF -1 0 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 EKF DEM(0) DEM(4) 10 20 time 30 40

A nonlinear convolution model level This system has a slow sinusoidal input or cause that excites increases in a single hidden state. The response is a quadratic function of the hidden states (c. f. , Arulampalam et al 2002).

DEM and particle filtering Sum of squared error Comparative performance

Inference on states Triple estimation (DEM) Learning parameters

An f. MRI study of attention Stimuli 250 radially moving dots at 4. 7 degrees/s Pre-Scanning 5 x 30 s trials with 5 speed changes (reducing to 1%) Task: detect change in radial velocity Scanning (no speed changes) 4 x 100 scan sessions; each comprising 10 scans of 4 different conditions F A F N S. . . . A – dots, motion and attention (detect changes) N – dots and motion S – dots V 5 (motion sensitive area) F – fixation Buchel et al 1999

A hemodynamic model Visual input Motion Attention convolution kernel state equations Output: a mixture of intra- and extravascular signal output equation

Inference on states Hemodynamic deconvolution (V 5) Learning parameters

… and a closer look at the states

Synthetic song-birds syrinx hierarchy of Lorenz attractors

Song recognition with DEM

… and broken birds

Summary Hierarchical dynamic models Generalised coordinates (dynamical priors) Hierarchal forms (structural priors) Variational filtering and action (freeform) Laplace approximation and DEM (fixed-form) Comparative evaluations Hemodynamics and Bird songs