Lecture 4 CS 4705 Sound Systems and Textto

  • Slides: 17
Download presentation
Lecture 4 CS 4705 Sound Systems and Text-to. Speech CS 4705

Lecture 4 CS 4705 Sound Systems and Text-to. Speech CS 4705

Sound Systems of Language • Phonetics – The sounds (phones) of the world’s languages,

Sound Systems of Language • Phonetics – The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced • Phonology – Rules that govern how phones are realized differently in different contexts • Technologies: – Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses – Text-to-Speech (TTS) systems take text as input and produce speech

Letters and Sounds • same spelling = different sounds o comb, tomb, bomb c

Letters and Sounds • same spelling = different sounds o comb, tomb, bomb c court, center, cheese oo blood, food, good s reason, surreal, shy • same sound = different spellings [i] sea, see, scene, receive, thief [s] cereal, same, miss [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie • combination of letters = single sound ch child, beach oo good, foot th that, bathe gh laugh • single letter = combination of sounds x exit, Texas u use, music • ‘silent’ letters k knife, know e moose, bone p psycho, pterodactyl gh through

Articulators teeth lips Alveolar ridge palate velum uvula pharyngeal larynx vocal folds: glottis trachea

Articulators teeth lips Alveolar ridge palate velum uvula pharyngeal larynx vocal folds: glottis trachea

Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database)

Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database) “Why did Ken set the soggy net on top of his deck? ”

Vocal fold vibration [UCLA Phonetics Lab demo]

Vocal fold vibration [UCLA Phonetics Lab demo]

Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http: //www. chass.

Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http: //www. chass. utoronto. ca/~danhall/phonetics/sammy. html

Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION PLACE OF ARTICULATION bilabial

Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION PLACE OF ARTICULATION bilabial stop p labio- inter- alveolar palatal velar glottal dental b t d k g q fric. f v th dh s z sh zh affric. ch jh nasal m n approx w l/r flap h ng y dx VOICING: voiceless voiced

American English vowel space HIGH iy uw eh ae uh ow ey FRONT ux

American English vowel space HIGH iy uw eh ae uh ow ey FRONT ux oy ax ah ay aw ix ih ao aa LOW BACK

Acoustic landmarks [p][ix][t] [ih][sh] [ax][n][p] [ae] [t][s] [iy][n] [s] [ae] [l][iy] “Patricia and Patsy

Acoustic landmarks [p][ix][t] [ih][sh] [ax][n][p] [ae] [t][s] [iy][n] [s] [ae] [l][iy] “Patricia and Patsy and Sally” [p] [ix] [t] [ih]

Syllables • Syllabification important for – pronunciation: deny/denim – speaking rate calculation: syllables per

Syllables • Syllabification important for – pronunciation: deny/denim – speaking rate calculation: syllables per second – word recognition in ASR • (onset) + nucleus + (coda): – – cat a at to • Lexical stress: primary, secondary, terciary – telephone

Phonological Rules • Not all instances of a given phone [x] sound/look alike •

Phonological Rules • Not all instances of a given phone [x] sound/look alike • Phoneme /x/ may have many allophones • Phonological rules map phonemes in context to allophones, e. g. – simple rules: /{t, d}/ --> [ ]/ V’ _ V – FSA’s, FST’s – declarative constraints: t: V’ _ V

Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced

Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4. 8: Jurafsky & Martin (2000), page 104.

Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax], [‘dhiy]) • Problems: –

Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax], [‘dhiy]) • Problems: – – – Homographs (bass/bass, wind/wind, desert/desert) Abbreviation (dr. , st. ) Numbers (2125551212) Acronyms (NAACL, IDIAP) Morphological variation (unrelentingly) Proper names and unknown words • rules + dictionaries/dictionaries + rules

 • Hybrid model: – FSTs model individual word pronunciation in lexicon (e. g.

• Hybrid model: – FSTs model individual word pronunciation in lexicon (e. g. reg-noun-stem entry c: k a: ae t: t) – FSAs model morphology (e. g. reg-noun-stem + s) – FSTs for pronunciation rules (e. g. s--> z) – special rules to model name and acronym pronunciation – default letter 2 sound rules for other words

Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo

Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo • Linguistic origin: Infiniti, vingt, Perez • Abbreviation expansion: – spacious living/dining rm w/frplc/dining room with fireplace – pls?

Summary • Phones realize phonemes in different contexts – Different places and manners of

Summary • Phones realize phonemes in different contexts – Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people • Versatile FSTs can model phonological as well as morphological and spelling systems • Many creative approaches toward pronunciation modeling for TTS • Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)