Laboratory for Information and Decision Systems Stochastic Systems
Laboratory for Information and Decision Systems Stochastic Systems Group A “Sticky” HDP-HMM for Systems with State Persistence Emily Fox, Erik Sudderth, Michael Jordan, and Alan Willsky ICML 2008 Helsinki, Finland Massachusetts Institute of Technology
Application: Speaker Diarization Bob John J B i o Jane l b l John Total # of people High probability of self-transition Multi-modal emissions Massachusetts Institute of Technology 2
Application: Maneuvering Target Tracking • HMM emissions = exogenous input driving dynamical system • Unknown number of maneuver modes Dynamical System Massachusetts Institute of Technology 3
HDP Prior on Infinite HMM HDP-HMM inferred state sequence • Nonparametric Bayesian prior on HMMs with unknown state space cardinality True. Observations state sequence • Encourages use of sparse subset of infinite state space • Allows new states to be created as more data are observed • Inadequately captures temporal state persistence Infinite HMM: Beal, et. al. , NIPS 2002 HDP-HMM: Teh, et. al. , JASA 2006 Massachusetts Institute of Technology 4
Outline • Background: HDP-HMM • “Sticky” HDP-HMM • Capturing multimodal emissions • Speaker diarization Massachusetts Institute of Technology 5
Hidden Markov Models states Time St at e observatio ns Massachusetts Institute of Technology 6
Hidden Markov Models states Time observatio ns Massachusetts Institute of Technology 7
Hidden Markov Models states Time observatio ns Massachusetts Institute of Technology 8
Hidden Markov Models states Time observatio ns Massachusetts Institute of Technology 9
HDP-HMM • Dirichlet process (DP): § State space of unbounded size § Model complexity adapts to observations St at e Time • Hierarchical: § Ties state transition distributions § Shared sparsity Massachusetts Institute of Technology 10
HDP-HMM Stick-breaking construction for: DP(g, H) • Average transition distribution: Stick of unit probability mass Massachusetts Institute of Technology 11
HDP-HMM • Average transition distribution: • State-specific transition distributions: sparsity of b is shared Massachusetts Institute of Technology 12
Sensitivity to Noise • HDP-HMM inadequately models temporal persistence of states • DP bias insufficient to prevent unrealistically rapid dynamics • Reduces predictive performance of inferred model Massachusetts Institute of Technology 13
“Sticky” HDP-HMM: Part I original sticky State-specific base measure Increased probability of self-transition Massachusetts Institute of Technology 14
Direct Assignment Sampler Rao-Blackwellized Gibbs Sampler likelihood Chinese restaurant prior • Marginalize: § Transition densities § Emission parameters • Sequentially sample: Conjugate base measure Þ closed form Splits true state, hard to merge Massachusetts Institute of Technology 15
Blocked Resampling HDP-HMM weak limit approximation • Compute Approximate backwards HDP: messages: § Average transition density § (Þ transition densities) • Sample: • Block sample Massachusetts Institute of Technology as: 16
Hyperparameters • Place priors on hyperparameters and learn them from data • Weakly informative priors • All results use the same settings hyperparameters can be set using the data Related self-transition parameter: Beal, et. al. , NIPS 2002 Massachusetts Institute of Technology 17
Results: Gaussian Emissions Sticky HDP-HMM Blocked sampler Sequential sampler Massachusetts Institute of Technology 18
Results: Fast Switching Observations Sticky HDP-HMM True state sequence HDP-HMM Massachusetts Institute of Technology 19
Outline • Background • “Sticky” HDP-HMM • Capturing multimodal emissions • Speaker diarization Massachusetts Institute of Technology 20
Issues with Multimodal Emissions Data generated from state 2 -state HMM HDP-HMM inferred sequence Massachusetts Institute of Technology 21
“Sticky” HDP-HMM: Part II states mixture component s observations • Approximate multimodal emissions with infinite Gaussian mixture • Temporal state persistence disambiguates model Massachusetts Institute of Technology 22
Results: Mixture Emissions Sticky HDP-HMM DP emissions Gaussian emissions Massachusetts Institute of Technology 23
Speaker Diarization Bob Massachusetts Institute of Technology John J B i o Jane l b l John 24
Processing of Features • Features: 19 -dim MFCCs • Features similar between speakers => challenging problem • Speakers look different over time Note: • No training data • Just input the raw features Massachusetts Institute of Technology 25
Results: 21 meetings Overall DER Best DER Worst DER 5 -best DER Sticky HDP-HMM 19. 04% 1. 26% 31. 42% 15. 14% ICSI 18. 37% 4. 39% 32. 23% N/A Massachusetts Institute of Technology 26
Results: Meeting 1 Sticky DER = 1. 26% ICSI DER = 7. 56% Massachusetts Institute of Technology 27
Results: Meeting 2 Sticky DER = 24. 06% 4. 37% ICSI DER = 22. 00% Massachusetts Institute of Technology 28
Conclusion • Examined limitations of original HDP-HMM • Presented “sticky” HDP-HMM with: § Parameter allowing bias towards self-transitions § DP emission densities for each HMM state • Simple and effective addition to the original HDP-HMM • Able to learn a wide range of dynamics, even when state persistence is not present in the data Massachusetts Institute of Technology 29
- Slides: 29