Value Function Approximation with Diffusion Wavelets and Laplacian
- Slides: 18
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University
Outline • • Introduction Approximate policy iteration Value function approximation Laplacian eigenfunctions approximation Diffusion Wavelets approximation Experimental results Conclusions
Introduction • In MDP models, it is desirable/necessary to approximate the value function for a large state size or reinforcement learning situation. • Two novel approaches are explored in this paper to make value function approximation on state space graphs
Approximate policy iteration • In a RL MDP model, value function approximation is a part of approximate policy iteration process, which is used to iteratively solve the RL problem.
Approximate policy iteration Sample (s, a, r, s’)
Value function approximation • A variety of linear and non-linear architectures have been widely studied as they offer many advantages in the context of value function approximation • However, many of them are handcoded in an ad hoc trial-and-error process by a human designer.
Value function approximation • A finite MDP can be defined as • Any policy defines a unique value function , which satisfies the Bellman equation • We want to project the value function into another lower dimensional space
Value function approximation • In the approximation, is a |S||A|*k matrix, each column of which is a basis function evaluated at (s, a) points, k is the number of basis functions selected and is a weight vector. • The problem is how to efficiently and effectively construct those basis functions
Laplacian eigenfunctions • We model the state space as a finite undirected weighted graph (G, E, W) • The combinational Laplacian L is defined as: • The normalized Laplacian is • We use the eigenfunctions of L as the orthonormal basis
Diffusion wavelets • Diffusion wavelets generalize wavelet analysis and associated signal processing techniques to functions on manifolds and graphs. • They allows fast and accurate computation of high powers of a Markov chain P on the graph, including direct computation of the Green’s function of the Markov chain, (I-P)1, for solving Bellman’s equation.
Diffusion wavelets • Markov Random Walk • We symmetrize P and take powers where and are eigenvalues and eigenfunctions of the normalized Laplacian
Diffusion wavelets • A diffusion wavelets tree consists of orthogonal diffusion scaling function and orthogonal wavelets. • The scaling functions span a subspace with the property , and the span of wavelets, , is the orthogonal complement of into.
Diffusion wavelets
• The detail subspaces • Downsampling, orthogonalization, and operator compression A - diffusion operator, G – Gram-Schmidt ortho-normalization, F - diffusion maps: X is the data set M - A G
Diffusion wavelets
Experimental results
Conclusions • Two novel value function approximation methods are exploited • The underlying representation and policies are simultaneously learned • Diffusion wavelets is a powerful tool for signal processing techniques of functions on manifolds and graphs
- Wavelet and multiresolution processing
- Facilitated diffusion
- Relocation diffusion diagram
- Definition of a conservative vector field
- Azimuthal symmetry spherical coordinates
- Verify stokes' theorem examples
- A generalization of unsharp masking is
- Graph laplacian regularization
- Laplacian operator
- Laplacian operator
- Similarity graph
- Laplace formula sheet
- Laplacian of gaussian
- Project outline
- Laplacian of gaussian
- Contoh value creation
- Piecewise functions absolute value
- Interpolation and approximation of curves in cad
- Variational shape approximation