Visual Speech Recognition Using Hidden Markov Models Kofi

  • Slides: 8
Download presentation
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS 280 Course Project

Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS 280 Course Project

Motivation • Visual articulation provides good information source for speech – Lip-reading humans can

Motivation • Visual articulation provides good information source for speech – Lip-reading humans can intelligibly recognize speech – Visual information provides robustness to noise • Can enhance speech recognition in various applications – Text annotation of multimedia data – Automatic computer dictation – Lip-reading in mobile phones for noisy environments

Project Overview • Visual speech recognition task using Tulips 1 database • Recognition performed

Project Overview • Visual speech recognition task using Tulips 1 database • Recognition performed by training features in HMMs • Cross-validation procedure used for training and testing • Experimented with features and HMM architecture

Tulips 1 • Small public audiovisual database • Consists of 12 speakers (9 male,

Tulips 1 • Small public audiovisual database • Consists of 12 speakers (9 male, 3 female) saying first four English digits • Video format: – Digitized (8 -bit grayscale pgm) images of lips of size 100 x 75 – Sampling rate: 30 fps

Features • Contour features – • PCA on raw image pixels – • 6

Features • Contour features – • PCA on raw image pixels – • 6 features related to geometry of the mouth and lips (hand generated) Experimented with different numbers of components Image preprocessing + PCA Processing included: 1) Symmetry enforcement 2) Lowpass filtering (9 x 9 Gsn kernel, σ=1. 5) and subsampling (5 ) 3) Compression and linearization

Results Contour Features • Best choice: 5 states and 1 Gaussian • Note high

Results Contour Features • Best choice: 5 states and 1 Gaussian • Note high accuracy with even 1 state • Indicates importance of delta components Raw Image Features • Best choice: 10 components • Similar performance to contour features, which require human assistance • Demonstrates power of PCA

Results Preprocessed Image Features • Procedure produces fair performance • Even better with addition

Results Preprocessed Image Features • Procedure produces fair performance • Even better with addition of PCA

Conclusions • • For given task, HMMs proved very effective HMM architecture significantly affects

Conclusions • • For given task, HMMs proved very effective HMM architecture significantly affects results Delta features appear to be quite useful Feature selection – Contour features best • Generation can potentially be automatic – Within limited exploration, “blind” statistical technique (i. e. , PCA) superior to image-specific one