Value Function Approximation with Diffusion Wavelets and Laplacian

Outline • • Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion

Introduction • In MDP models, it is desirable/necessary to approximate the value function for

Approximate policy iteration • In a RL MDP model, value function approximation is a

Approximate policy iteration Sample (s, a, r, s’)

Value function approximation • A variety of linear and non-linear architectures have been widely

Value function approximation • A finite MDP can be defined as • Any policy

Value function approximation • In the approximation, is a |S||A|*k matrix, each column of

Laplacian eigenfunctions • We model the state space as a finite undirected weighted graph

Diffusion wavelets • Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to

Diffusion wavelets • Markov Random Walk • We symmetrize P and take powers where

Diffusion wavelets • A diffusion wavelets tree consists of orthogonal diffusion scaling function and

• The detail subspaces • Downsampling, orthogonalization, and operator compression A - diffusion

Conclusions • Two novel value function approximation methods are exploited • The underlying representation

Slides: 18

Download presentation

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University

Outline • • Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion Wavelets approximation Experimental results Conclusions

Introduction • In MDP models, it is desirable/necessary to approximate the value function for a large state size or reinforcement learning situation. • Two novel approaches are explored in this paper to make value function approximation on state space graphs

Approximate policy iteration • In a RL MDP model, value function approximation is a part of approximate policy iteration process, which is used to iteratively solve the RL problem.

Approximate policy iteration Sample (s, a, r, s’)

Value function approximation • A variety of linear and non-linear architectures have been widely studied as they offer many advantages in the context of value function approximation • However, many of them are handcoded in an ad hoc trial-and-error process by a human designer.

Value function approximation • A finite MDP can be defined as • Any policy defines a unique value function , which satisfies the Bellman equation • We want to project the value function into another lower dimensional space

Value function approximation • In the approximation, is a |S||A|*k matrix, each column of which is a basis function evaluated at (s, a) points, k is the number of basis functions selected and is a weight vector. • The problem is how to efficiently and effectively construct those basis functions

Laplacian eigenfunctions • We model the state space as a finite undirected weighted graph (G, E, W) • The combinational Laplacian L is defined as: • The normalized Laplacian is • We use the eigenfunctions of L as the orthonormal basis

Diffusion wavelets • Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. • They allows fast and accurate computation of high powers of a Markov chain P on the graph, including direct computation of the Green’s function of the Markov chain, (I-P)1, for solving Bellman’s equation.

Diffusion wavelets • Markov Random Walk • We symmetrize P and take powers where and are eigenvalues and eigenfunctions of the normalized Laplacian

Diffusion wavelets • A diffusion wavelets tree consists of orthogonal diffusion scaling function and orthogonal wavelets. • The scaling functions span a subspace with the property , and the span of wavelets, , is the orthogonal complement of into.

Diffusion wavelets

• The detail subspaces • Downsampling, orthogonalization, and operator compression A - diffusion operator, G – Gram-Schmidt ortho-normalization, F - diffusion maps: X is the data set M - A G

Diffusion wavelets

Experimental results

Conclusions • Two novel value function approximation methods are exploited • The underlying representation and policies are simultaneously learned • Diffusion wavelets is a powerful tool for signal processing techniques of functions on manifolds and graphs