PSY 369 Psycholinguistics Language Comprehension Speech recognition Different

  • Slides: 32
Download presentation
PSY 369: Psycholinguistics Language Comprehension Speech recognition

PSY 369: Psycholinguistics Language Comprehension Speech recognition

Different features than visual Visual word recognition Speech Perception Where are you going n

Different features than visual Visual word recognition Speech Perception Where are you going n n Some parallel input Orthography n n n Letters Clear delineation Difficult to learn n n Serial input Phonetics/Phonology n n n Acoustic features Usually no delineation “Easy” to learn

Different problems too Chest Jew Wade Aim In It Just you wait a minute

Different problems too Chest Jew Wade Aim In It Just you wait a minute Delights Haven Dime Daylight Savings Time http: //www. playmadgabonline. com/ Canoes He Wad Ice He Can You See What I See? Free Quaintly As Quest Shuns Frequently Asked Questions

Hard Problems in Speech Perception Wave form Show me the money n Linearity (parallel

Hard Problems in Speech Perception Wave form Show me the money n Linearity (parallel transmission): Acoustic features often spread themselves out over other sounds n Where does show start and money end? n Demo's and info

Hard Problems in Speech Perception I owe you a Yo-Yo n Segmentation problem: Unlike

Hard Problems in Speech Perception I owe you a Yo-Yo n Segmentation problem: Unlike visual input, the acoustic input is not physically segmented n Illusion of silence. There are no silent gaps in the wave form, even though we may “hear” some.

Hard Problems in Speech Perception n Segmentation problem: Unlike visual input, the acoustic input

Hard Problems in Speech Perception n Segmentation problem: Unlike visual input, the acoustic input is not physically segmented n Illusion of silence. There are no silent gaps in the wave form, even though we may “hear” some.

Hard Problems in Speech Perception n Segmentation problem: Unlike visual input, the acoustic input

Hard Problems in Speech Perception n Segmentation problem: Unlike visual input, the acoustic input is not physically segmented n Here the silence that we see in the acoustics isn’t perceived as a gap in the word

Hard Problems in Speech Perception Wave form Show me the money n Lack of

Hard Problems in Speech Perception Wave form Show me the money n Lack of Invariance: n One phoneme should have a one waveform n This is not the case. The /i/ (‘ee’) in ‘money’ and ‘me’ are different

Hard Problems in Speech Perception n Lack of Invariance: n One phoneme should have

Hard Problems in Speech Perception n Lack of Invariance: n One phoneme should have a one waveform n Another example: Here is the phoneme /d/ followed by different vowels

Hard Problems in Speech Perception Peter buttered the burnt toast n Lack of Invariance:

Hard Problems in Speech Perception Peter buttered the burnt toast n Lack of Invariance: n One phoneme should have a one waveform n n And another. The phrase has five /t/ phonemes, but there are not 5 identical sweeps in the spectrogram There aren’t invariant cues for phonetic segments n Although the search continues

Hard Problems in Speech Perception n Co-articulation: the influence of the articulation (pronunciation) of

Hard Problems in Speech Perception n Co-articulation: the influence of the articulation (pronunciation) of one phoneme on that of another phoneme. n n Essentially, producing more than one speech sound at once May be helpful because it allows some parallel transmission of information (possibly helping predict what’s coming next) n Each sound partially shaped by sounds before & after it n n n keel vs kill vs cool / kil / vs / k. Il / vs / kul / (IPA characters) place of articulation and rounding on the k differ a lot different versions of “the same sound” in n different contexts n from different speakers This is what allows us to talk so fast May be helpful because it allows some parallel transmission of information (possibly helping predict what’s coming next)

Hard Problems in Speech Perception n Trading relations n Most phonetic distinctions have more

Hard Problems in Speech Perception n Trading relations n Most phonetic distinctions have more than one acoustic cue as a result of the particular articulatory gesture that gives the distinction. n n n Voice-onset-time (VOT) Energy in burst Onset frequency of the first formant Placement in syllable e. g. , slit–split – the /p/ relies on silence and rising formant, different mixtures of these can result in the same perception Perception must establish some "trade-off" between the different cues.

Hard Problems in Speech Perception n Many factors that may be important n n

Hard Problems in Speech Perception n Many factors that may be important n n n Acoustic Information Visual information Prosodic information Semantic context Syntactic structure Top-down UNDERSTANDING Bottom-up

Using Visual information n The Mc. Gurk effect: Mc. Gurk and Mac. Donald (1976)

Using Visual information n The Mc. Gurk effect: Mc. Gurk and Mac. Donald (1976) • Showed people a video where the audio and the video don’t match • Think “dubbed movie” • Visual /ga/ with auditory /ba/ often hear /da/ n n n Mc. Gurk effect 2 Implications • Phoneme perception is an active process • Influenced by both audio and visual information

Beyond the segment n Prosodic factors (supra segmentals) n English: n n n Speech

Beyond the segment n Prosodic factors (supra segmentals) n English: n n n Speech is divided into phrases. Every phrase has a focus. Word stress is meaningful in English. Stressed syllables are aligned in a fairly regular rhythm, while unstressed syllables take very little time. An extended flat or low-rising intonation at the end of a phrase can indicate that a speaker intends to continue to speak. A falling intonation sounds more final.

Beyond the segment n Prosodic factors (supra segmentals) n Stress n Emphasis on syllables

Beyond the segment n Prosodic factors (supra segmentals) n Stress n Emphasis on syllables in sentences n n n Rate n n n On meaning n “black bird” versus “blackbird” Top-down effects on perception n Better anticipation of upcoming segments when syllable is stressed Speed of articulation: Faster talking - shorter vowels, shorter VOT Normalization: taking the speaker’s rate into account Intonation n Use of pitch to signify different meanings across sentences

Top-down effects on Speech Perception n Sentence context effects n n n Excised speech

Top-down effects on Speech Perception n Sentence context effects n n n Excised speech Phoneme restoration effect Sentence context effects Top-down UNDERSTANDING Bottom-up

Excised Speech n Syntactic and semantic cues can help Pollack & Pickett (1964) Task:

Excised Speech n Syntactic and semantic cues can help Pollack & Pickett (1964) Task: Recorded conversations and excised individual words. Presented the words to listeners for identification n n Within context Out of context Results: n n Words out of context were only recognized 47% of time, identification was greatly improved with context Suggests that clarity in speech reflects processing (top-down as well as bottom-up)

Phoneme restoration effect Warren (1970) Click here for a demo and additional information The

Phoneme restoration effect Warren (1970) Click here for a demo and additional information The state governors met with their respective legi*latures convening in the capital city. * /s/ deleted and replaced with a cough Task: Listen to a sentence which contained a word from which a phoneme was deleted and replaced with another noise (e. g. , a cough) Results: • Participants heard the word normally, despite the missing phoneme • Usually failed to identify which phoneme was missing Interpretation: We can use top-down knowledge to “fill in” the missing information

Phoneme restoration effect Warren and Warren (1970) What if the missing phoneme was ambiguous?

Phoneme restoration effect Warren and Warren (1970) What if the missing phoneme was ambiguous? The *eel was on the axle. The *eel was on the shoe. The *eel was on the orange. The *eel was on the table. Results: Participants heard the contextually appropriate word normally, despite the missing phoneme

Semantic Influences Garnes & Bond (1976): n 16 tokens, spanning the spectrum of bait-date-gate

Semantic Influences Garnes & Bond (1976): n 16 tokens, spanning the spectrum of bait-date-gate (/b/ /d/ /g/) n n n So some were clear examples (unambiguous), others in between (ambiguous) 3 carrier sentences (context): n Here’s the fishing gear and the ______. n Check the time and the _______. n Paint the fence and the _______. Results n If unambiguous, get semantically implausible sentences (Paint the fence and the bait. ) n If ambiguous (near a phoneme boundary), semantic context effects

Phoneme restoration effect n Possible loci of phoneme restoration effects n Perceptual loci of

Phoneme restoration effect n Possible loci of phoneme restoration effects n Perceptual loci of effect: n n Lexical or sentential context influences the way in which the word is initially perceived. Post-perceptual loci of effect: n n Lexical or sentential context influences decisions about the nature of the missing phoneme information. Samuel (2001) attempts to look at this issue (homework # 3)

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) Hear: The scientist made a new discovery last year. NUDIST

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) Hear: The scientist made a novel discovery last year. NUDIST

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) Hear: The scientist made a novel discovery last year. The scientist made a new discovery last year. faster

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word

Cross-modal priming Shillcock (1990) hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming) Hear: The scientist made a novel discovery last year. The scientist made a new discovery last year. faster NUDIST gets primed by segmentation error Although no conscious report of hearing “nudist”

Theories of speech perception n n Motor Theory Direct Realist Theory General Auditory Approach

Theories of speech perception n n Motor Theory Direct Realist Theory General Auditory Approach Cohort TRACE Model

Motor theory of speech perception n A. Liberman (initially proposed in late 50 s,

Motor theory of speech perception n A. Liberman (initially proposed in late 50 s, recent Liberman & Mattingly, 1985) n n Direct translation of acoustic speech into articulatory categories Holds that speech perception and motor control involved linked (or the same) neural processes n Theory held that categorical perception was a direct reflection of articulatory organization n Categories with discrete gestures (e. g. , consonants) will be perceived categorically Categories with continuous gestures (e. g. , vowels) will be perceived continuously There is a speech perception module that operates independently of general auditory perception

Speech Perception & the brain Frontal slices showing differential activation elicited during lip and

Speech Perception & the brain Frontal slices showing differential activation elicited during lip and tongue movements (Left), syllable articulation including [p] and [t] (Center), and listening to syllables including [p] and [t] (Right) Pulvermüller F et al. PNAS 2006; 103: 7865 -7870 © 2006 by National Academy of Sciences

Motor theory of speech perception n Some problems for MT n n Categorical perception

Motor theory of speech perception n Some problems for MT n n Categorical perception found in non-speech sounds (e. g. , music) Categorical perception for speech sounds in non-humans n Chinchillas can be trained to show categorical perception of /t/ and /d/ consonant-vowel syllables (Kuhl & Miller, 1975)

Other theories of speech perception n Direct Realist Theory (C. Fowler and others) n

Other theories of speech perception n Direct Realist Theory (C. Fowler and others) n n n Similar to Motor theory, articulation representations are key, but here they are directly perceived (related to Gibson’s perceptual theory) Perceiving speech is part of a more general perception of gestures that involves the motor system General Auditory Approach (e. g. , Diehl, Massaro) n Do not invoke special mechanisms for speech perception, instead rely on more general mechanisms of audition and perception n For nice reviews see: n Diehl, Lotto, & Holt (2003) n Galantucci, Fowler, Turvey (2006)

Other theories of spoken word rec. n Cohort Model (Marslen-Wilson & Welsh, 1978; Discussed

Other theories of spoken word rec. n Cohort Model (Marslen-Wilson & Welsh, 1978; Discussed last time) 1) The acoustic information at the beginning of a word activates a “cohort” of possible words 2) Syntax and semantics influence the selection of the target word from the cohort n TRACE Model (Elman and Mc. Clelland 1984, 1986) n n n Connectionist, parallel distributed model Processing occurs through excitatory and inhibitory connections – in processing units called nodes 3 levels of nodes: features, phonemes, and words all highly interconnected