Speech Hearing Perception Perry C Hanavan Recommendation My

  • Slides: 59
Download presentation
Speech & Hearing Perception Perry C. Hanavan

Speech & Hearing Perception Perry C. Hanavan

Recommendation • My Fair Lady (musical adaption) • Pygmalion

Recommendation • My Fair Lady (musical adaption) • Pygmalion

Review • Peripheral Auditory Mechanism – Outer ear (pinna & external auditory canal) •

Review • Peripheral Auditory Mechanism – Outer ear (pinna & external auditory canal) • Acoustic transmission • Quarter wave resonator – Middle ear (TM, ossicles, Eustachian tube, tympanum) • Mechanical transmission/transduction – Inner ear (cochlea, semicircular canals, saccule, utricle) • Hydraulic transmission/transduction • Mechanical transduction – Auditory Nerve (afferent, afferent) • Chemical-electrical transmission

Outer Ear • Pinna • External auditory meatus – Quarter wave resonator • The

Outer Ear • Pinna • External auditory meatus – Quarter wave resonator • The resonant frequency of the average adult ear canal is about 3000 Hz. • Smaller ear canals, like in children, have higher resonant frequencies around 4000 Hz

Localization • 2 ears • The two most important localization cues are the interaural

Localization • 2 ears • The two most important localization cues are the interaural time difference, or ITD, and the interaural intensity difference or IID. • Head shadow effect of the sound wave: a sound coming from a source located to one side of the head will have a higher intensity, or be louder, at the ear nearest the sound source. • Phase differences also plays a role in localization

Middle Ear • Impedance mismatch • Air vs. fluid – Area ratio hypothesis –

Middle Ear • Impedance mismatch • Air vs. fluid – Area ratio hypothesis – Lever hypothesis (3: 1) • Stiffness and mass have inverse effects on frequency in a resonant system: f=(1/2 p) • Mass dominated systems have a lower resonant frequency than stiffness dominated systems. • Increasing stiffness in any ear component (membranes, ossicles, cavity) improves the efficiency of transmission of high frequencies. • Adding mass to the system, e. g. , by increasing cavity volume or increasing ossicular chain mass, favors low frequencies.

Middle Ear

Middle Ear

Inner Ear • The cochlea is a fluid-filled spiral with a resonator, the basilar

Inner Ear • The cochlea is a fluid-filled spiral with a resonator, the basilar membrane, and neuroreceptor, the Organ of Corti • Inner ears are tuned in that inner ear stiffness and mass characteristics are major determinants of hearing ranges • Differences in hearing ranges are dictated largely by differences in stiffness and mass of the basilar membrane that are the result of basilar membrane thickness and width variations along the cochlear spiral. • Basilar membranes are essentially tonotopically arranged resonator arrays, ranging high to low from base to apex.

Basilar Membrane

Basilar Membrane

Traveling Wave • http: //www. lloydwatts. com/collaborators. shtml

Traveling Wave • http: //www. lloydwatts. com/collaborators. shtml

Inner vs. Outer Hair Cells

Inner vs. Outer Hair Cells

Inner Ear Mechanics • Basilar membrane animations • Hair Cells • Outer Hair Cell

Inner Ear Mechanics • Basilar membrane animations • Hair Cells • Outer Hair Cell Motility

Central Auditory Path

Central Auditory Path

CNS • Cochlear nuclei (modulate motility of OHC, acoustic reflex) • Trapezoid Body •

CNS • Cochlear nuclei (modulate motility of OHC, acoustic reflex) • Trapezoid Body • Superior Olivary Complex (reflexes centers for Moro, startle, auralpalpebral, acoustic reflexes) • Lateral Lemniscus • Inferior Colliculus • Medial Geniculate Body • Primary Auditory Cortex • Wernicke-s Area • Corpus Callosum

Auditory CNS Path Central Auditory Pathway

Auditory CNS Path Central Auditory Pathway

Excellent Brief Review • Review of Function

Excellent Brief Review • Review of Function

Hearing Threshold

Hearing Threshold

Auditory Masking • Blocking or obscuring a sound • Simultaneous masking – Presentation of

Auditory Masking • Blocking or obscuring a sound • Simultaneous masking – Presentation of target sound and masking sound – Broadband Noise (BBN) vs Narrowband Noise (NBN) • Critical bandwidth (when using NBN) – Upward spread of masking – Central masking

Precedence Effect • Fusion of sounds and initial echoes into one auditory event and

Precedence Effect • Fusion of sounds and initial echoes into one auditory event and the localization of that fused sound at the source of the earliest arriving sound • Stenger test used by Audiologists using this effect when individual suspected of malingering

Equal Level Contours

Equal Level Contours

Music Analysis • Pure Tone (Periodic) • Periodic Complex Tone • Aperiodic Complex (Noise)

Music Analysis • Pure Tone (Periodic) • Periodic Complex Tone • Aperiodic Complex (Noise) • Fundamental – lowest tone in complex periodic sound • Harmonics – whole number multiple of fundamental • Missing fundamental – auditory illusion

Fundamentals • 100, 200, 300 Hz (100) • 800, 900, 1000 Hz (100)

Fundamentals • 100, 200, 300 Hz (100) • 800, 900, 1000 Hz (100)

Frequency • Place principle: Helmholtz suggested the basilar membrane resonate in specific places to

Frequency • Place principle: Helmholtz suggested the basilar membrane resonate in specific places to a tone which Bekesy confirmed later • Frequency principle: Seeback and revived by Wever, suggested that the spike potentials of auditory nerve determines pitch • Volley principle: neurons fire in groups while one neuron is reloading another is firing

Auditory Scene Analysis ASA: a concept created by Albert Bregman, is a process in

Auditory Scene Analysis ASA: a concept created by Albert Bregman, is a process in which the auditory system takes the mixture of sound that it derives from a complex natural environment and sorts it into packages of acoustic evidence in which each package probably has arisen from a single source of sound. This grouping helps pattern recognition not to mix information from different sources. Online Examples – Compact disc of ASA Link – Segregating and Grouping

Speech Production • Formants

Speech Production • Formants

Speech Production • Phonemes (sound units of language) – Consonants (s, z) • Voiced

Speech Production • Phonemes (sound units of language) – Consonants (s, z) • Voiced (b, d, g) • Unvoiced (p, t, k) – Vowels (a, e, o, i, u) – Diphthongs (oy, ei)

Formants • Vowels – Greater intensity, formant structure, all voiced, constriction of air flow

Formants • Vowels – Greater intensity, formant structure, all voiced, constriction of air flow less than consonant • Diphthongs – Vowel characteristics, but transition (glide) • Consonants – Less intensity, greater constriction of air flow

Pattern Playback Haskins Laboratory

Pattern Playback Haskins Laboratory

Vocal Tract • Approximately 17 cm for males • 5/6 the length for females

Vocal Tract • Approximately 17 cm for males • 5/6 the length for females • Children roughly half the length of adult male

Math Model for Vowel Formants • Formant Calculation Handout • Formant Plotting Handout •

Math Model for Vowel Formants • Formant Calculation Handout • Formant Plotting Handout • Excel Model

Source Filter Fo (source produced at vocal folds) Formants (F 1, F 2, F

Source Filter Fo (source produced at vocal folds) Formants (F 1, F 2, F 3, …) created by vocal tract resonance Source which is emphasized and not modulated by vocal tract resonance (F 1, F 2, F 3, shown at left)

Perception of Vowels • /a/ vowel has greatest intensity with unvoiced /θ/ as weakest

Perception of Vowels • /a/ vowel has greatest intensity with unvoiced /θ/ as weakest vowel • Front vowels perceived on basis of F 1 frequency and average of F 2 and F 3, whereas back vowels are perceived on the basis of the average of F 1 and F 2, as well as F 3 • So is it the absolute frequency values of the formants? • Or the ratio of F 2 to F 1? • Perhaps it is the invariant cues (frequency changes that occur with coarticulation F 1/F 2 F 3 F 1 F 2/F 3

Formant with Tongue Position More pictorials

Formant with Tongue Position More pictorials

Vowel Spectrograph

Vowel Spectrograph

Chart Vowel Formants • Acoustics and Tongue Position • Video Clip

Chart Vowel Formants • Acoustics and Tongue Position • Video Clip

Lip Rounding

Lip Rounding

Vowel Formants

Vowel Formants

Online Examples of Formants • Sound to Graph • Spectral Cues Homepage

Online Examples of Formants • Sound to Graph • Spectral Cues Homepage

Perception of Diphthongs • Perceived on basis of their formant transitions • Salient feature:

Perception of Diphthongs • Perceived on basis of their formant transitions • Salient feature: rapidity of transition

Diphthongs

Diphthongs

Consonants • Perception different for consonants than vowels • Greater variety of consonant types

Consonants • Perception different for consonants than vowels • Greater variety of consonant types than vowels • Greater complexity for consonants

International Phonetic Alphabet (consonants) 1. 26 letters of alphabet – abcdefghijklmnopqrstuvwxyz 2. Only list

International Phonetic Alphabet (consonants) 1. 26 letters of alphabet – abcdefghijklmnopqrstuvwxyz 2. Only list phonemes – bdfghjklmnprstvwz 3. Digraph phonemes – ch, sh, th 4. Other phonemes

Production of Consonants • Place of production – Where major constriction occurs in vocal

Production of Consonants • Place of production – Where major constriction occurs in vocal tract • Manner of production – How consonant is produced • Voicing – Voiced or unvoiced

Place of Production Example of some consonant phonemes: • Bilabial p b m w

Place of Production Example of some consonant phonemes: • Bilabial p b m w • Labiodental f v • Dental th • Alveolar t d s z l r • Palatal ch sh • Velar k g ng • Glottal h

Manner of Production Example of some consonants: • • • Stops Fricatives Affricates Nasals

Manner of Production Example of some consonants: • • • Stops Fricatives Affricates Nasals Semivowels p t k b d g f v s sh z h ch dg m n ng w l r j

Voicing Examples of some consonants • Voiced • Unvoiced b d g v z

Voicing Examples of some consonants • Voiced • Unvoiced b d g v z l r w p t k f s h

Stops Produced with a closure within the oral cavity, a build up of pressure

Stops Produced with a closure within the oral cavity, a build up of pressure behind this closure and a release of the closure allowing the air to be rapidly expelled. Acoustically these events can be divided into five components: 1. Occlusion 2. Transient 3. Frication 4. Aspiration 5. Transition More info

Fricatives Fricative production involves two articulators being brought together and held close enough for

Fricatives Fricative production involves two articulators being brought together and held close enough for the escaping air to become turbulent creating an aperiodic (noise) sound. Maybe be voiced or unvoiced. The closure phase of fricatives is characterized by the continuant noisy aperiodic component. The characteristics of the noise are the result of the position of the constriction, the shape of the orifice, and the aerodynamic forces of the air stream. Acoustic characteristics include: High frequency hiss, long duration, weak to moderate intensity

Affricates • Stop with a fricative release – but palatal. • Combination of stop

Affricates • Stop with a fricative release – but palatal. • Combination of stop and fricative characteristics. • Closure, burst followed by short silence then frication • The affricates can be distinguished from the fricatives by the presence of closure and by the duration of noise which is longer for the fricatives. • The shorter the duration of noise, the shorter the silence necessary to elicit an affricate response. • Affricates have a shorter rise time than fricatives. Rise time is the time from onset to peak intensity of frication.

Nasals • Like the oral tract, the nasal tract has its own resonant frequencies

Nasals • Like the oral tract, the nasal tract has its own resonant frequencies or formants. • The most commonly reported nasal formants occur at 300 Hz, 1 k. Hz, 2. 2 k. Hz, 2. 9 k. Hz, 4 k. Hz. • Antiresonances enter whenever there is a side branch in the main acoustic pathway. An antiresonance or zero serves to decrease the spectral energy at specific frequencies by absorbing the sound at or near the antiresonant frequencies. These cumulatively have the effect of reducing the total amplitude of the sound generated.

Approximates (Semivowels) Approximants are consonants most similar to vowels in their articulation and hence

Approximates (Semivowels) Approximants are consonants most similar to vowels in their articulation and hence their acoustic structure. Articulation involves one articulator approaching another but without the tract becoming narrowed to such an extent that turbulent airflow occurs. Like vowels, approximants are: • highly resonant • produced with a relatively open vocal tract • characterized by identifiable formant structures • continuant sounds since there is no occlusion or momentary stoppage of the air stream • non turbulent due to lack of constriction • oral sounds

Speech Perception Theories • How do we perceive speech? – – Individual sounds (phonemes)

Speech Perception Theories • How do we perceive speech? – – Individual sounds (phonemes) Syllables Words Sentences • How do we derive meaning from the ocean of sounds we hear? • Speech is variable • Speakers vary in speech • Do listeners tune in to the variant or invariant cues?

Connected Speech

Connected Speech

Formants • So what is it about formants that the ear analyzes for vowels?

Formants • So what is it about formants that the ear analyzes for vowels? • Specific frequency? • Ratio F 1/F 2? • Ratio F 2/F 3? • Adult speech vs. children • Familiar speaker vs. unfamiliar speaker

Consonant Cues • Formant transitions provide some cues: • Place of production? F 2

Consonant Cues • Formant transitions provide some cues: • Place of production? F 2 transition • Manner of production? F 1 transition • Voicing?

Categorical Perception Categories Phonemic boundary

Categorical Perception Categories Phonemic boundary

Mc. Gurk Effect Phenomenon of cross modal (hearing/vision) integration in which what we see

Mc. Gurk Effect Phenomenon of cross modal (hearing/vision) integration in which what we see affects what we hear Movie

Role of Context • Speech conveys meaning • Redundancy (removal of information w/o causing

Role of Context • Speech conveys meaning • Redundancy (removal of information w/o causing breakdown in communication) • Homophones (ran ban man) – Context (eg. , lipreading in a restaurant) • Phoneme sequences – Right ear - phonemes /b a n k e t/ – Left ear – phonemes /l a n k e t/ – Heard – blanket never lbanket despite adjusting timing of phonemes to each ear • Phonemic Restoration – “The state governors met with their respective legislatures convening in the capital city. ” (overdubbed with “coughing sound” for “s” in “legislatures”…listeners did not report anything unusual)

Theories of Speech Perception • Trace Theory • Cohort Theory • Motor Theory

Theories of Speech Perception • Trace Theory • Cohort Theory • Motor Theory