Kernel adaptive filtering Lecture slides for EEL 6502

The big picture Adaptive filters are linear. How do we learn (continuous) nonlinear structures?

A particular approach Assume a parametric model … Universality: The parametric model should be

It’s difficulty Nonlinear performance surface Can we learn nonlinear structure using knowledge of linear

A ‘trick’y solution Top-down design Optimal filter exists in the span of input data

Inner product and pd kernel are equivalent Inner product space: Linear space with inner

How do things work? Take a positive definite kernel Mercer decomposition Then considering Bottom-up

Functional view We do not explicitly evaluate the mapping. implicitly applied through the kernel

Ridge regression How to find ? Problem Regularization *** Solution How to invert an

Online learning LMS update rule in feature space Set to 0 How do we

Kernel-LMS Initialize 1. Need to choose a kernel 2. Need to select step size

Functional approximation Kernel should be universal e. g. How to choose

Implementation details Choosing best value of 1. Cross validation: Accurate but time consuming Large

Self-regularization : Over-fitting parameters to fit samples How does KLMS do it? How to

Ill-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverse

Self-regularization : Well-posed-ness How does KLMS do it? More information on the course website!

Slides: 16

Download presentation

Kernel adaptive filtering Lecture slides for EEL 6502 Spring 2011 Sohan Seth

The big picture Adaptive filters are linear. How do we learn (continuous) nonlinear structures?

A particular approach Assume a parametric model … Universality: The parametric model should be able to approximate any continuous function. Nonlinearly map signal to higher dimensional space and. . . e. g. neural network nonlinear apply a linear filter. Universal approximation for sufficiently large

It’s difficulty Nonlinear performance surface Can we learn nonlinear structure using knowledge of linear adaptive filtering? A different approach Filter order is Fix the nonlinear mapping, and use linear filtering. How do we choose the mappings? e. g. Need to guarantee universal approximation!

A ‘trick’y solution Top-down design Optimal filter exists in the span of input data *** Output is a projection e. g Mapping is infinite dimensional. Only the inner product matters, not the mapping

Inner product and pd kernel are equivalent Inner product space: Linear space with inner product 1. Symmetry, Positive definite (pd) kernel 2. Linearity, e. g. or, 3. Positive definiteness is an inner product in some space Use pd kernel to implicitly construct nonlinear mapping

How do things work? Take a positive definite kernel Mercer decomposition Then considering Bottom-up design Generalization of eigenvalue decomposition in functional space. can be infinite Nonlinearity is implicit in the choice of kernel. parameters to learn

Functional view We do not explicitly evaluate the mapping. implicitly applied through the kernel function. Universality is guaranteed through the kernel. But it is Feature space Need to remember all the input data and the coefficients

Ridge regression How to find ? Problem Regularization *** Solution How to invert an infinite dimensional matrix

Online learning LMS update rule in feature space Set to 0 How do we compute these?

Kernel-LMS Initialize 1. Need to choose a kernel 2. Need to select step size 3. Need to store 4. No regularization *** Iterate for 5. time complexity for each iteration Unkwown is the largest eigenvalue of

Functional approximation Kernel should be universal e. g. How to choose

Implementation details Choosing best value of 1. Cross validation: Accurate but time consuming Large Small 2. Thumb-rules: Fast but not accurate Limiting network size 1. Importance estimation Close centers are redundant

Self-regularization : Over-fitting parameters to fit samples How does KLMS do it? How to remove it?

Ill-posed-ness appears due to small singular values in the autocorrelation matrix while taking inverse How to remove it? Weight the inverse of the small singular values Tikhonov regularization Solve e. g.

Self-regularization : Well-posed-ness How does KLMS do it? More information on the course website! Username: Password: Regularizer on the expected solution However, large singular values might be suppressed. The stepsize acts as regularizer