Kernel adaptive filtering Lecture slides for EEL 6502

  • Slides: 16
Download presentation
Kernel adaptive filtering Lecture slides for EEL 6502 Spring 2011 Sohan Seth

Kernel adaptive filtering Lecture slides for EEL 6502 Spring 2011 Sohan Seth

The big picture Adaptive filters are linear. How do we learn (continuous) nonlinear structures?

The big picture Adaptive filters are linear. How do we learn (continuous) nonlinear structures?

A particular approach Assume a parametric model … Universality: The parametric model should be

A particular approach Assume a parametric model … Universality: The parametric model should be able to approximate any continuous function. Nonlinearly map signal to higher dimensional space and. . . e. g. neural network nonlinear apply a linear filter. Universal approximation for sufficiently large

It’s difficulty Nonlinear performance surface Can we learn nonlinear structure using knowledge of linear

It’s difficulty Nonlinear performance surface Can we learn nonlinear structure using knowledge of linear adaptive filtering? A different approach Filter order is Fix the nonlinear mapping, and use linear filtering. How do we choose the mappings? e. g. Need to guarantee universal approximation!

A ‘trick’y solution Top-down design Optimal filter exists in the span of input data

A ‘trick’y solution Top-down design Optimal filter exists in the span of input data *** Output is a projection e. g Mapping is infinite dimensional. Only the inner product matters, not the mapping

Inner product and pd kernel are equivalent Inner product space: Linear space with inner

Inner product and pd kernel are equivalent Inner product space: Linear space with inner product 1. Symmetry, Positive definite (pd) kernel 2. Linearity, e. g. or, 3. Positive definiteness is an inner product in some space Use pd kernel to implicitly construct nonlinear mapping

How do things work? Take a positive definite kernel Mercer decomposition Then considering Bottom-up

How do things work? Take a positive definite kernel Mercer decomposition Then considering Bottom-up design Generalization of eigenvalue decomposition in functional space. can be infinite Nonlinearity is implicit in the choice of kernel. parameters to learn

Functional view We do not explicitly evaluate the mapping. implicitly applied through the kernel

Functional view We do not explicitly evaluate the mapping. implicitly applied through the kernel function. Universality is guaranteed through the kernel. But it is Feature space Need to remember all the input data and the coefficients

Ridge regression How to find ? Problem Regularization *** Solution How to invert an

Ridge regression How to find ? Problem Regularization *** Solution How to invert an infinite dimensional matrix

Online learning LMS update rule in feature space Set to 0 How do we

Online learning LMS update rule in feature space Set to 0 How do we compute these?

Kernel-LMS Initialize 1. Need to choose a kernel 2. Need to select step size

Kernel-LMS Initialize 1. Need to choose a kernel 2. Need to select step size 3. Need to store 4. No regularization *** Iterate for 5. time complexity for each iteration Unkwown is the largest eigenvalue of

Functional approximation Kernel should be universal e. g. How to choose

Functional approximation Kernel should be universal e. g. How to choose

Implementation details Choosing best value of 1. Cross validation: Accurate but time consuming Large

Implementation details Choosing best value of 1. Cross validation: Accurate but time consuming Large Small 2. Thumb-rules: Fast but not accurate Limiting network size 1. Importance estimation Close centers are redundant

Self-regularization : Over-fitting parameters to fit samples How does KLMS do it? How to

Self-regularization : Over-fitting parameters to fit samples How does KLMS do it? How to remove it?

Ill-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverse

Ill-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverse How to remove it? Weight the inverse of the small singular values Tikhonov regularization Solve e. g.

Self-regularization : Well-posed-ness How does KLMS do it? More information on the course website!

Self-regularization : Well-posed-ness How does KLMS do it? More information on the course website! Username: Password: Regularizer on the expected solution However, large singular values might be suppressed. The stepsize acts as regularizer