74 419 Artificial Intelligence 2004 Speech Natural Language

Speech & Natural Language Processing Areas in Speech Recognition • Signal Processing • Phonetics

Speech Production & Reception Sound and Hearing • change in air pressure sound wave

Speech Recognition Acoustic / sound wave Filtering, Sampling Spectral Analysis; FFT Frequency Spectrum Signal

Speech Signal Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows”

Video of glottis and speech signal in ling. WAVES (from http: //www. lingcom. de)

Speech Signal Analog-Digital Conversion of Acoustic Signals → Sampling Analysis of Signal in Time

Speech Recognition Characteristics Speech Recognition vs. Speaker Identification Speaker-dependent vs. speaker independent Single word

Additional References Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A

Slides: 18

Download presentation

74. 419 Artificial Intelligence 2004 Speech & Natural Language Processing • Speech Recognition • acoustic signal as input • conversion into written words • Natural Language Processing • written text as input • sentences (well-formed or not) • Spoken Language Understanding • analysis of spoken language (transcribed speech)

Speech & Natural Language Processing Areas in Speech Recognition • Signal Processing • Phonetics • Word Recognition Areas in Natural Language Processing • • • Morphology Grammar & Parsing (syntactic analysis) Semantics Pragamatics Discourse / Dialogue Spoken Language Understanding

Speech Production & Reception Sound and Hearing • change in air pressure sound wave • reception through inner ear membrane / microphone • break-up into frequency components: receptors in cochlea / mathematical frequency analysis (e. g. Fast-Fourier Transform FFT) → Frequency Spectrum • perception/recognition of phonemes and subsequently words (e. g. Neural Networks, Hidden-Markov Models)

Speech Recognition Acoustic / sound wave Filtering, Sampling Spectral Analysis; FFT Frequency Spectrum Signal Processing / Analysis Features (Phonemes; Context) Phoneme Recognition: HMM, Neural Networks Phonemes Grammar or Statistics Phoneme Sequences / Words Word Sequence / Sentence Grammar or Statistics for likely word sequences

Speech Signal Analog-Digital Conversion of acoustic signal → Sampling in Time Frames = “windows” Characteristics of a Speech Signal Ø formants - strong frequency components; characterize e. g. vowels, gender of speaker; dark stripe in spectrum Ø pitch – fundamental frequency (baseline for higher frequency harmonics like formants) Ø place of articulation (recognition model based on model of vocal tract) Ø change in frequency distribution

Video of glottis and speech signal in ling. WAVES (from http: //www. lingcom. de)

Speech Signal Analog-Digital Conversion of Acoustic Signals → Sampling Analysis of Signal in Time Frames (“windows”) Characteristics of a Speech Signal Ø formants - strong frequency components; characterize e. g. vowels, gender of speaker; dark stripe in spectrum Ø pitch – fundamental frequency (baseline for higher frequency harmonics like formants) Ø place of articulation (recognition model based on model of vocal tract) Ø change in frequency distribution

Speech Recognition Characteristics Speech Recognition vs. Speaker Identification Speaker-dependent vs. speaker independent Single word vs. continuous speech Large vs. small vocabulary

Additional References Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice. Hall, NJ, 2001.