Language of Motion Hybrid Systems Modeling Ren Vidal

Language of Motion: Hybrid Systems Modeling René Vidal Center for Imaging Science Johns Hopkins University

Recognition of individual and crowd motions Rigid backgrounds Individual motions Group motions Input video Dynamic backgrounds Crowd motions NSF CAREER 2005 -2010: Recognition of Dynamic Activities in Unstructured Environments NSF CDI 2010 -2012: A Bio-Inspired Approach to Recognition of Human Movements and Movement Styles

Modeling videos with hybrid systems • Model output with mixture of dynamical models exhibiting changes in – Space: multiple motions in a video – Time: appearing and disappearing motions in a video SARX 1 SARX 2 SARXnt • Solve a very complex hybrid system identification problem NSF EHS 2005 -2008: An Algebraic Geometric Approach to Hybrid System Identification

Overall goals of hybrid system modeling • Bottom-up Modeling – The models should compactly capture the underlying structure of the raw motion signal. This will be done by developing methods for hybrid dynamical system (HDS) identification. • Top-down Inference – The models should capture variations in the motion signal between two instances of the same surgeme, performed by either the same or a different surgeon. Variations may be purely stochastic, due to surgical context or caused by the surgeon's skill level. This will be done using HMMs and ideas from automatic speech recognition. • Joint Top-down and Bottom-up Modeling and Inference – Identification of structure in the motion signal via a HDS need not be purely data-driven. We will investigate injection of top-down information into HDS identification for surgeme recognition, such as prior distributions on the identified HDS parameters and temporal dependencies in the surgeme sequence.

Specific goals of hybrid system modeling • Data: – Motion data: surgical, hand, whole body – Video: surgical, whole body • Model learning: from data to models – Dynamical models (Vidal) • Sparse representation techniques for hybrid system identification – Language models (Khudanpur) • Hidden Markov Models of observed HDS parameters • Language models of surgeme sequences – “Dynamical language” models (Khudanpur & Vidal) • Prior models for supervised hybrid system identification

Specific goals of hybrid system modeling • Model comparison – Distances between dynamical models: Binet-Cauchy kernels – Distances between discrete trajectories of an HMM – Metrics on hybrid systems (Petreczky-Vidal HSCC’ 07) • Model classification – Dynamic Boost (Vidal-Favaro ICCV’ 07) • Extending boosting to dynamical systems – Bag of dynamical systems (Ravichandran et al. CVPR’ 09) • Using dictionaries of motion primitives to make recognition invariant to changes in – Viewpoint – Scale – Illumination

Outline of today’s talk • What are hybrid dynamical systems (HDS)? • How can hybrid systems be used for video – – Synthesis Registration Classification Segmentation • What’s next? – – Sparse representation techniques for hybrid system identification Distances on hybrid systems for time-series classification Time series classification with invariance Co-registration of motion and video data

Dynamical systems y 1 y 2 y 3 yt Discrete Continuous

Dynamical systems x 1 x 2 x 3 xt y 1 y 2 y 3 yt Hidden Markov Models: Discrete state Discrete or continuous output Linear Dynamical Systems: Continuous state Continuous output

Dynamical systems q 1 x 2 x 3 xt y 1 y 2 y 3 yt q 2 q 3 qt Hybrid Systems: Switched: Jump Markov:

Dynamical systems x 1 x 2 x 3 xt y 1 y 2 y 3 yt q 1 q 2 q 3 qt o 1 o 2 o 3 ot Hybrid Systems:

Identification of linear systems • Model is a LDS driven by IID white Gaussian noise dynamics images • Bilinear problem, can do EM appearance • Optimal solution: subspace identification (Overschee & de Moor ‘ 94) • PCA-based solution in the absence of noise (Soatto et al. ‘ 01) – Can compute C and z(t) from the SVD of the images – Given z(t) solving for A is a linear problem

Using linear systems to model time series • Dynamic textures: Soatto ICCV’ 01 • Human gaits: Bissacco CVPR’ 01 • Extract a set of features from the video sequence – – Spatial filters ICA/PCA Wavelets Intensities of all pixels • Model spatiotemporal evolution of features as the output of a linear dynamical system (LDS): Soatto et al. ‘ 01 dynamics images appearance

Using linear systems for video synthesis • Once a model of a dynamic texture has been learned, one can use it to synthesize novel sequences: – Shöld et al. ’ 00, Soatto et al. ’ 01, Doretto et al. ’ 03, Yuan et al. ‘ 04

Using linear systems for video mosaicing Given a non-rigid dynamical scene captured through multiple static cameras, we want to register the two sequences spatially and temporally Challenges – We are dealing with non-rigid dynamical scenes, where feature tracking and matching is very difficult. – We are dealing with both spatial and temporal misalignments. Goal – We would like to develop a spatial alignment technique that is invariant to the temporal alignment by reducing video registration to an image registration problem. A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Overview of our approach System identification Conversion to canonical form Extract SIFT Features Matching System identification Conversion to canonical form A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008 Extract SIFT Features

Results: format Register RGB Decomposition A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Results: non rigid scenes A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Results: more sequences A. Ravichandran and R. Vidal, ICCV Workshop on Dynamical Vision, 2007 A. Ravichandran and R. Vidal, European Conference on Computer Vision, 2008

Classifying/recognizing novel sequences • Given videos of several classes of dynamic textures, one can use their models to classify new sequences (Saisan et al. ’ 01) – Identify dynamical models for all sequences in the training set – Identify a dynamical model for novel sequences – Assign novel sequences to the class of its nearest neighbor • Requires one to compute a distance between dynamical models – Martin distance (Martin ’ 00) – Subspace angles (De Cook ’ 02 ‘ 05) – Kullback-Leibler divergence (Chan. Vasconcellos ‘ 07) – Binet-Cauchy kernels (Vishwanathan-Smola-Vidal ‘ 07) V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Binet-Cauchy kernels for AR models • Consider two stable AR models • Define an embedding • Binet-Cauchy kernel • Trace kernel for AR models where M satisfies the equation • Determinant kernel for AR models where M satisfies the equation V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: clustering video clips • Kill Bill: Vol 1 (2003) • http: //www. imdb. com/title/tt 0266 697/ • Randomly sample – 480 clips from the movie – 120 frames each • Fit a linear dynamical model to each clip • Use trace kernel to compute the k-nearest neighbors of each clip • Use Locally Linear Embedding (LLE) for clustering the clips and embedding them in 2 D space V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: clustering video clips Two people talking Sword fight Person rolling in the snow V. Vishwanathan, A. Smola, and R. Vidal. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International Journal of Computer Vision, 2007

Results: dynamic texture recognition • UCLA Database: 200 sequences (75 frames, 160 x 110 pixels), 50 classes, dynamics extracted from 48 x 48 window)

Results: human gait recognition • Weizmann Database: 10 activities R. Chaudry, A. Ravichandran, G. Hager and R. Vidal. Histograms of Oriented Optical Flow and Binet -Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human. CVPR 2009.

Identification of hybrid systems • Given input/output data, identify – – Number of discrete states Model parameters of linear systems Hybrid state (continuous & discrete) Switching parameters (partition of state space) • Piecewise ARX systems – Clustering approach: k-means clustering + regression + classification + iterative refinement: (Ferrari-Trecate et al. ‘ 03) – Bayesian approach: ML via EM algorithm (Juloski et al. ’ 05) – Mixed integer quadratic programming: (Bemporad et al. ’ 01) – Greedy/iterative approach: (Bemporad et al. ’ 03) • Switched ARX systems – Batch algebraic approach: (Vidal et al. ‘ 03 ’ 04, Ma-Vidal ‘ 05, Bako-Vidal ’ 07, Lauer et al. ‘ 09) – Recursive algebraic approach: (Vidal et al. ‘ 04 ’ 05 ‘ 07) – Support vector regression approach: (Lauer et al. ‘ 09) NSF 2006: An Algebraic Geometric Approach to Hybrid System Identification

Hybrid systems for temporal segmentation R. Vidal, Recursive Identification of Switched ARX Systems. Automatica, 2008

Hybrid systems for temporal segmentation • • Empty living room Middle-aged man enters Woman enters Young man enters, introduces the woman and leaves • Middle-aged man flirts with woman and steals her tiara • Middle-aged man checks the time, rises and leaves • Woman walks him to the door • Woman returns to her seat • Woman misses her tiara • Woman searches her tiara • Woman sits and dismays

Using hybrid systems spatial segmentation • Fixed boundary segmentation results Ocean-smoke Ocean-dynamics Ocean-appearance • Moving boundary segmentation results Racoon Ocean-fire A. Ghoreyshi and R. Vidal, Segmenting Dynamic Textures with Ising Descriptors, ARX Models and Level Sets. , ECCV Workshop on Dynamical Vision, 2006

Specific goals of hybrid system modeling • Sparse representation techniques for hybrid system identification (Vidal) • Extending boosting to dynamical systems? – Dynamic. Boost (Vidal-Favaro ICCV’ 07) • Recognizing videos with multiple dynamic textures – Metrics on hybrid systems (Petreczky-Vidal HSCC’ 07) • Bag of dynamical systems: making recognition invariant to changes in – Viewpoint – Scale – Illumination

Sparse hybrid system identification

Bag-of-Words: Sample Topic (Economy)

Bag of dynamical systems • Language of motion primitives – Each motion primitive is represented with a dynamical system – Motion words are obtained by clustering dynamical systems Ravichandran and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009

Bag of dynamical systems • UCLA database: 200 sequences 50 classes (8 view-inv. classes) • Recognition using bag of dynamical systems versus using Doretto et al. Ravichandran and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009

Acknowledgements • • • 2009 Sloan Fellowship ONR YIP N 00014 -09 -1 -0839 ONR N 00014 -09 -10084 ONR N 00014 -05 -10836 NSF CAREER ISS-0447739 NSF CNS-0809101 NSF CNS-0509101 ARL Robotics-CTA JHU APL-934652 NIH RO 1 HL 082729 WSE-APL NIH-NHLBI • JHU – Rizwan Chaudhry – Atiyeh Ghoreyshi – Avinash Ravichandran • UIUC – Yi Ma • Heriot Watt – Paolo Favaro • Yahoo – Alex Smola • Purdue – SVN Vishwanathan