Lecture 4 CS 4705 Sound Systems and Textto

Sound Systems of Language • Phonetics – The sounds (phones) of the world’s languages,

Letters and Sounds • same spelling = different sounds o comb, tomb, bomb c

Articulators teeth lips Alveolar ridge palate velum uvula pharyngeal larynx vocal folds: glottis trachea

Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database)

Vocal fold vibration [UCLA Phonetics Lab demo]

Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http: //www. chass.

Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION PLACE OF ARTICULATION bilabial

American English vowel space HIGH iy uw eh ae uh ow ey FRONT ux

Acoustic landmarks [p][ix][t] [ih][sh] [ax][n][p] [ae] [t][s] [iy][n] [s] [ae] [l][iy] “Patricia and Patsy

Syllables • Syllabification important for – pronunciation: deny/denim – speaking rate calculation: syllables per

Phonological Rules • Not all instances of a given phone [x] sound/look alike •

Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced

Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax], [‘dhiy]) • Problems: –

• Hybrid model: – FSTs model individual word pronunciation in lexicon (e. g.

Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo

Summary • Phones realize phonemes in different contexts – Different places and manners of

Slides: 17

Download presentation

Lecture 4 CS 4705 Sound Systems and Text-to. Speech CS 4705

Sound Systems of Language • Phonetics – The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced • Phonology – Rules that govern how phones are realized differently in different contexts • Technologies: – Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses – Text-to-Speech (TTS) systems take text as input and produce speech

Letters and Sounds • same spelling = different sounds o comb, tomb, bomb c court, center, cheese oo blood, food, good s reason, surreal, shy • same sound = different spellings [i] sea, see, scene, receive, thief [s] cereal, same, miss [u] true, few, choose, lieu, do [ay] prime, buy, rhyme, lie • combination of letters = single sound ch child, beach oo good, foot th that, bathe gh laugh • single letter = combination of sounds x exit, Texas u use, music • ‘silent’ letters k knife, know e moose, bone p psycho, pterodactyl gh through

Articulators teeth lips Alveolar ridge palate velum uvula pharyngeal larynx vocal folds: glottis trachea

Articulators in action (Sample from the Queen’s University / ATR Labs X-ray Film Database) “Why did Ken set the soggy net on top of his deck? ”

Vocal fold vibration [UCLA Phonetics Lab demo]

Places of articulation dental labial alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal http: //www. chass. utoronto. ca/~danhall/phonetics/sammy. html

Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION PLACE OF ARTICULATION bilabial stop p labio- inter- alveolar palatal velar glottal dental b t d k g q fric. f v th dh s z sh zh affric. ch jh nasal m n approx w l/r flap h ng y dx VOICING: voiceless voiced

American English vowel space HIGH iy uw eh ae uh ow ey FRONT ux oy ax ah ay aw ix ih ao aa LOW BACK

Acoustic landmarks [p][ix][t] [ih][sh] [ax][n][p] [ae] [t][s] [iy][n] [s] [ae] [l][iy] “Patricia and Patsy and Sally” [p] [ix] [t] [ih]

Syllables • Syllabification important for – pronunciation: deny/denim – speaking rate calculation: syllables per second – word recognition in ASR • (onset) + nucleus + (coda): – – cat a at to • Lexical stress: primary, secondary, terciary – telephone

Phonological Rules • Not all instances of a given phone [x] sound/look alike • Phoneme /x/ may have many allophones • Phonological rules map phonemes in context to allophones, e. g. – simple rules: /{t, d}/ --> [ ]/ V’ _ V – FSA’s, FST’s – declarative constraints: t: V’ _ V

Allophones of /t/ • What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4. 8: Jurafsky & Martin (2000), page 104.

Application: Word Pronunciation for TTS • Pronouncing dictionaries (the: [‘dhax], [‘dhiy]) • Problems: – – – Homographs (bass/bass, wind/wind, desert/desert) Abbreviation (dr. , st. ) Numbers (2125551212) Acronyms (NAACL, IDIAP) Morphological variation (unrelentingly) Proper names and unknown words • rules + dictionaries/dictionaries + rules

• Hybrid model: – FSTs model individual word pronunciation in lexicon (e. g. reg-noun-stem entry c: k a: ae t: t) – FSAs model morphology (e. g. reg-noun-stem + s) – FSTs for pronunciation rules (e. g. s--> z) – special rules to model name and acronym pronunciation – default letter 2 sound rules for other words

Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words • Rhyming analogy: varoom/room, todo/dodo • Linguistic origin: Infiniti, vingt, Perez • Abbreviation expansion: – spacious living/dining rm w/frplc/dining room with fireplace – pls?

Summary • Phones realize phonemes in different contexts – Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people • Versatile FSTs can model phonological as well as morphological and spelling systems • Many creative approaches toward pronunciation modeling for TTS • Next time: Read Ch 6 (Guest Speaker: Sameer Maskey)