Value Function Approximation with Diffusion Wavelets and Laplacian

  • Slides: 18
Download presentation
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M.

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University

Outline • • Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion

Outline • • Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion Wavelets approximation Experimental results Conclusions

Introduction • In MDP models, it is desirable/necessary to approximate the value function for

Introduction • In MDP models, it is desirable/necessary to approximate the value function for a large state size or reinforcement learning situation. • Two novel approaches are explored in this paper to make value function approximation on state space graphs

Approximate policy iteration • In a RL MDP model, value function approximation is a

Approximate policy iteration • In a RL MDP model, value function approximation is a part of approximate policy iteration process, which is used to iteratively solve the RL problem.

Approximate policy iteration Sample (s, a, r, s’)

Approximate policy iteration Sample (s, a, r, s’)

Value function approximation • A variety of linear and non-linear architectures have been widely

Value function approximation • A variety of linear and non-linear architectures have been widely studied as they offer many advantages in the context of value function approximation • However, many of them are handcoded in an ad hoc trial-and-error process by a human designer.

Value function approximation • A finite MDP can be defined as • Any policy

Value function approximation • A finite MDP can be defined as • Any policy defines a unique value function , which satisfies the Bellman equation • We want to project the value function into another lower dimensional space

Value function approximation • In the approximation, is a |S||A|*k matrix, each column of

Value function approximation • In the approximation, is a |S||A|*k matrix, each column of which is a basis function evaluated at (s, a) points, k is the number of basis functions selected and is a weight vector. • The problem is how to efficiently and effectively construct those basis functions

Laplacian eigenfunctions • We model the state space as a finite undirected weighted graph

Laplacian eigenfunctions • We model the state space as a finite undirected weighted graph (G, E, W) • The combinational Laplacian L is defined as: • The normalized Laplacian is • We use the eigenfunctions of L as the orthonormal basis

Diffusion wavelets • Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to

Diffusion wavelets • Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. • They allows fast and accurate computation of high powers of a Markov chain P on the graph, including direct computation of the Green’s function of the Markov chain, (I-P)1, for solving Bellman’s equation.

Diffusion wavelets • Markov Random Walk • We symmetrize P and take powers where

Diffusion wavelets • Markov Random Walk • We symmetrize P and take powers where and are eigenvalues and eigenfunctions of the normalized Laplacian

Diffusion wavelets • A diffusion wavelets tree consists of orthogonal diffusion scaling function and

Diffusion wavelets • A diffusion wavelets tree consists of orthogonal diffusion scaling function and orthogonal wavelets. • The scaling functions span a subspace with the property , and the span of wavelets, , is the orthogonal complement of into.

Diffusion wavelets

Diffusion wavelets

 • The detail subspaces • Downsampling, orthogonalization, and operator compression A - diffusion

• The detail subspaces • Downsampling, orthogonalization, and operator compression A - diffusion operator, G – Gram-Schmidt ortho-normalization, F - diffusion maps: X is the data set M - A G

Diffusion wavelets

Diffusion wavelets

Experimental results

Experimental results

Conclusions • Two novel value function approximation methods are exploited • The underlying representation

Conclusions • Two novel value function approximation methods are exploited • The underlying representation and policies are simultaneously learned • Diffusion wavelets is a powerful tool for signal processing techniques of functions on manifolds and graphs