MultivariateState Models for Speech Recognition Mark HasegawaJohnson Ph
- Slides: 30
Multivariate-State Models for Speech Recognition Mark Hasegawa-Johnson, Ph. D. mhj@icsl. ucla. edu NIH Post-Doctoral Fellow, Lecturer, UCLA Department of Electrical Engineering Research Associate, MIT Speech Communication Group
Key Point Speech is a one-dimensional signal which encodes multiple simultaneous partially independent information streams.
Outline Background: Univariate Hidden Markov Models Problem Statement: Multivariate Content of Speech Ex. : Composite Acoustic Cues Multivariate State Models: Definition Complexity Issues Ex. : Composite Acoustic Cues
Background: Statistical Classification Class Definition: Functional Form with Trainable Parameters Training: Modify Parameters of p(obs | class) Create Lookup Table of p(class) Classification: class = argmax p(class | obs)
Hidden Markov Models
HMM Phone Models
HMM Word Models
HMM Sentence Models
Recognition Scoring Find Q to maximize the “Recognition Probability, ” P(O, Q) = p(i) p(o 1|i) p(i|i) p(o 2|i) …
Implementation: the Viterbi Algorithm
Background: Stop Cons. Release n Three “Places of Articulation: ” u Lips (b, p) u Tongue Blade (d, t) u Tongue Body (g, k)
Problem Statement: Content of Speech is Multivariate 1. Source Information: Prosody, Articulatory Features
Content of Speech is Multivariate 2. Useful Non-Source Information: Composite Acoustic Cues
Composite Cues: Traditional Solution
Types of Measurement Error • Small Errors: Spectral Perturbation • Large Errors: Pick the Wrong Peak Amp. (d. B) Frequency (Hertz)
Large Errors are 20% of Total Std Dev of Small Errors = 45 -72 Hz Std Dev of Large Errors = 218 -1330 Hz P(Large Error) = 0. 17 -0. 22 Log PDF Measurement Error (Hertz) re: Manual Transcriptions
Measurement Error Predicts Classification Error
Solution: Composite Cues as State Variables
Complexity of Solution Without Additional Constraints
Useful Constraint #1: State Independence
Useful Constraint #2: Hierarchical Dependence
Description of the Test System
Test System Results
a Posteriori Measurement Distributions: 10 ms After /d/ in “dark” DFT Amplitude DFT Convexity P(F | O, Q) Frequency (0 -4000 Hertz)
Conclusions n n Speech Signal is Affected by Multiple Information Streams. Multivariate State Models Can Explicitly Model Multiple Information Streams: u. Articulatory Features u. Composite Acoustic Cues n Complexity is Viable if State Variables are Independent or Hierarchically-Dependent.
Future Directions n Multivariate-State Training Algorithms u. Search for Provable Low-Cost Algorithms u. Test Heuristic, Non-Provable Algorithms n n Replace “Phone String” w/ Multivariate Articulatory-Feature Representation Prosody u. Simultaneous Recog. of Prosody and Text u. Combine Prosody and Text to Extract Meaning
Speech Production Research: Factor Analysis of MRIDerived Tongue Shapes Hypothesis 1 During speech, tongue is controlled in a lowdimensional subspace. Hypothesis 2 Shape of the subspace is speaker-dependent. Hypothesis 3 Speaker-dependent control spaces are more similar acoustically than articulatorily.
MRI Image Collection • GE Signa 1. 5 T • T 1 -weighted • 3 mm slices • 24 cm FOV • 256 x 256 pixels • Coronal, Axial • 3 Subjects • 11 Vowels • Breath-hold in vowel position for 25 seconds
MRI Image Segmentation In CTMRedit: • Manual • Seeded Region Growing Tested: • Snake • Structural Saliency
Problem Statement: Content of Speech is Multivariate 2. Higher-Level Acoustic Information: Relational Spectral Cues
- Optical mark reader advantages and disadvantages
- Optical mark recognition definition
- Kinect for windows runtime
- Fundamentals of speech recognition
- Deep learning speech recognition
- Ionic speech recognition
- Julia speech recognition
- Melspectrum
- How do students with dyslexia see words
- Cmu speech recognition
- Speech recognition
- Speech recognition app inventor
- Dragon speech recognition
- Electron speech recognition
- Htk speech recognition tutorial
- What is the difference between models & semi modals?
- Formuö
- Typiska drag för en novell
- Tack för att ni lyssnade bild
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Kassaregister ideell förening
- Tidbok för yrkesförare
- Anatomi organ reproduksi
- Förklara densitet för barn
- Datorkunskap för nybörjare
- Stig kerman
- Mall debattartikel
- Magnetsjukhus
- Nyckelkompetenser för livslångt lärande