Pitch perception in auditory scenes Papers on pitch

  • Slides: 27
Download presentation
Pitch perception in auditory scenes

Pitch perception in auditory scenes

Papers on pitch perception… • of a single sound source LOTS - too many?

Papers on pitch perception… • of a single sound source LOTS - too many? • of more than one sound source Almost none ± a few 2

Bach: Musical Offering (strings) 4

Bach: Musical Offering (strings) 4

Talk outline • Low-level grouping cues in pitch perception – harmonic structure • Conjoint

Talk outline • Low-level grouping cues in pitch perception – harmonic structure • Conjoint or disjoint allocation – onset-time – context – spatial cues. • Use of a difference in harmonic structure between two sound sources to help identify, track over time, and localise simultaneous sound sources. 5

Excitation pattern of complex tone on basilar membrane unresolved 1600 800 600 400 25.

Excitation pattern of complex tone on basilar membrane unresolved 1600 800 600 400 25. 0 20. 0 15. 0 10. 0 5. 0 0. 0 base Output of 1600 Hz fil ter 1/200 s = 5 ms 1 0. 8 1. 5 0. 6 1 0. 4 0. 5 0. 2 0 -1 -5. 0 Output of 200 Hz fil ter 1/200 s = 5 ms 2 -0. 5 0 apex log (ish) frequency 0. 2 0. 4 0. 6 0. 8 1 0 -0. 2 0. 4 0. 6 0. 8 1 -0. 4 -0. 6 -1. 5 -0. 8 -2 -1 6

Mistuned harmonic’s contribution to pitch declines as Gaussian function of mistuning • Experimental evidence

Mistuned harmonic’s contribution to pitch declines as Gaussian function of mistuning • Experimental evidence from mistuning expts: 1. 5 ∆F = a - k ∆f exp(-∆f o 0 ∆F 0 (Hz) 0. 5 Match low pitch -0. 5 s = 19. 9 k = 0. 073 -1 400 800 2 /2 s ) 1 Moore, Glasberg & Peters JASA (1985). 0 2 1200 1600 frequency (Hz) 2000 2400 -1. 5 540 560 580 600 620 640 660 680 Frequency of mistuned harmonic f (Hz) Darwin (1992). In M. E. H. Schouten (Ed). The auditory processing of speech: from sounds to words Berlin: Mouton de Gruyter 7

Harmonic Sieve • Only consider frequencies that are close enough to harmonic. Useful as

Harmonic Sieve • Only consider frequencies that are close enough to harmonic. Useful as front-end to a Goldstein-type model of pitch perception. Duifhuis, Willems & Sluyter JASA (1982). blocked 200 Hz sieve spacing 0 400 800 1200 frequency (Hz) 1600 2000 2400 8

Is “harmonic sieve” necessary with autocorrelation models? • Autocorrelation could in principle explain mistuning

Is “harmonic sieve” necessary with autocorrelation models? • Autocorrelation could in principle explain mistuning effect – mistuned harmonic initially shifts autocorrelation peak – then produces its own peak • But the numbers do not work out. – Meddis & Hewitt model is too tolerant of mistuning. 9

Disjoint allocation ? Double Complex, Ipsilateral 600 Hz TARGET 200 155 465 775 MATCH

Disjoint allocation ? Double Complex, Ipsilateral 600 Hz TARGET 200 155 465 775 MATCH 155 465 Will the 200 -Hz series grab the 600 -Hz component, and so make it give less of a pitch shift to the 155 Hz series ? 775 Darwin, Buffa, Williams and Ciocca (1992). in Auditory physiology and perception edited by Cazals, Horner and Demany (Pergamon, Oxford) 10

Disjoint allocation… (2) • No evidence of 600 -Hz component being grabbed by "in-tune"

Disjoint allocation… (2) • No evidence of 600 -Hz component being grabbed by "in-tune" 200 -Hz complex 11

Pitch determined by conjoint allocation • Decision on each simultaneous pitch taken independently. 12

Pitch determined by conjoint allocation • Decision on each simultaneous pitch taken independently. 12

Mistuning & pitch Mean pitch shift (Hz) 1 vowel complex 0. 8 0. 6

Mistuning & pitch Mean pitch shift (Hz) 1 vowel complex 0. 8 0. 6 0. 4 90 ms 0. 2 0 -0. 2 8 subjects 0 1 2 3 5 % Mistuning of 4 th Harmonic 8 13

Onset asynchrony & pitch Mean pitch shift (Hz) 1 ± 3% mistuning 8 subjects

Onset asynchrony & pitch Mean pitch shift (Hz) 1 ± 3% mistuning 8 subjects 0. 8 vowel complex 0. 6 0. 4 0. 2 0 T -0. 2 90 ms 0 80 160 240 Onset Asynchrony T (ms) 320 14

Adaptation ? Increases effect of mistuning 15

Adaptation ? Increases effect of mistuning 15

Use of onset-tme • A long onset-time removes a harmonic from pitch calculation. •

Use of onset-tme • A long onset-time removes a harmonic from pitch calculation. • Not just due to adaptation because of regrouping experiment • NB Onset-time much longer for pitch (100 s of ms) than for hearing sound out as separate harmonic or for timbre judgements of complex (10 s of ms). 16

Repeated context and pitch Darwin, Hukin, & Al-Khatib, B. Y. (1995). "Grouping in pitch

Repeated context and pitch Darwin, Hukin, & Al-Khatib, B. Y. (1995). "Grouping in pitch perception: evidence for sequential constraints, " J. Acoust. Soc. Am. 98, 880 -885. 17

Pitch perception is not a bacon slicer • Pitch perception does not operate on

Pitch perception is not a bacon slicer • Pitch perception does not operate on independent timeslices • It uses, for example, the “Old + New” heuristic to parse which components are temporally relevant Speckschneider maschine 18

These conclusions apply to resolved harmonics Separate simultaneous pitches only audible with unresolved harmonics

These conclusions apply to resolved harmonics Separate simultaneous pitches only audible with unresolved harmonics when each pitch comes predominantly from a separate frequency region. Problem for autocorrelation models And the Old+New Heuristic breaks down. No advantage for onset-time. 19

Superimposing unresolved harmonics 2 k. Hz - 3 k. Hz Fo = 100 Hz

Superimposing unresolved harmonics 2 k. Hz - 3 k. Hz Fo = 100 Hz 0. 9636 0 ミ 0. 9636 0 Time (s) 0. 1 0. 9696 Fo = 125 Hz 0 ミ 0. 9696 0 Time (s) 0. 1 1. 929 sum 0 ミ 1. 929 0 Time (s) 0. 1 Carlyon, R. P. (1996). "Encoding the fundamental frequency of a complex tone in the presence of a spectrally overlapping masker, " J. Acoust. Soc. Am. 99, 51724. 20

Spatial cues ? 1. 5 ∆F = a - k ∆f exp(-∆f o /2

Spatial cues ? 1. 5 ∆F = a - k ∆f exp(-∆f o /2 s 2 ) 1 s = 17. 4 k = 0. 061 0 (Hz) 0. 5 0 -0. 5 (k=. 073 vs. 061) s = 19. 9 -1 Cf Beerends and Houtsma JASA (1989). 2 ∆F • Contralateral mistuned component contributes almost as much as ipsilateral k = 0. 073 contralateral -1. 5 540 560 580 600 620 640 660 680 Frequency of mistuned harmonic f (Hz) 21

Summary of constraints for pitch • • • Harmonic (Gaussian s. d. 3%) Onset

Summary of constraints for pitch • • • Harmonic (Gaussian s. d. 3%) Onset time (~100 -200 ms) Old +New Repeated Context V. weak spatial constraint Independent estimates of different simultaneous F 0 s (conjoint allocation) • N. B. These constraints different for unresolved harmonics and for e. g. timbre / vowel perception 22

Using pitch differences to separate different objects • Helps intelligibility of simultaneous speech •

Using pitch differences to separate different objects • Helps intelligibility of simultaneous speech • Helps to track one voice in presence of others • Helps to separate objects for independent localisation 23

DFo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Two

DFo between two sentences (Bird & Darwin 1998; after Brokx & Nooteboom, 1982) Two sentences (same talker) • only voiced consonants • (with very few stops) Thus maximising Fo effect Masking sentence = 140 Hz ± 0, 1, 2, 5, 10 semitones Target sentence Fo = 140 Hz Task: write down target sentence Replicates & extends Brokx & Nooteboom 24

Example of using harmonicity for grouping for another property How do we localise complex

Example of using harmonicity for grouping for another property How do we localise complex sounds when more than one sound source is present? Maybe we group sounds first and then pool the localisation estimates of the component frequencies for that sound. More stable than grouping by location? 25

Localisation by ITD: Jeffress / Trahiotis & Stern 500 Hz Narrow-band noise Hear on

Localisation by ITD: Jeffress / Trahiotis & Stern 500 Hz Narrow-band noise Hear on Left Right ear leads by 1. 5 ms (phase ambiguity) 500 Hz Wider-band noise Right ear leads by 1. 5 ms Hear on Right (consistent ITD) 26

Resolution of phase ambiguity 27

Resolution of phase ambiguity 27

Mistuning & localisation 15 6 Ss Pointer IID (d. B) 10 5 0 -5

Mistuning & localisation 15 6 Ss Pointer IID (d. B) 10 5 0 -5 -10 -15 0 1 3 6 percent mistuning 29