On the manifolds of spatial hearing Vikas C

  • Slides: 37
Download presentation
On the manifolds of spatial hearing Vikas C. Raykar and Ramani Duraiswami University of

On the manifolds of spatial hearing Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park NIPS 2006 workshop on novel applications of dimensionality reduction December 9, 2006

Human spatial hearing How are humans able to judge the direction of a sound

Human spatial hearing How are humans able to judge the direction of a sound source? Why do we have two ears? Why is the pinna shaped the way it is? 2

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 3

How do humans localize sound source? § Primary cues § Interaural Time Difference (ITD)

How do humans localize sound source? § Primary cues § Interaural Time Difference (ITD) § Interaural Level Difference (ILD) § Explains localization only in the horizontal plane. § All points in the one half of the hyperboloid of revolution have the same ITD and IID. § [cone of confusion ] § Other cues § Pinna shape gives elevation cues for higher frequencies. § Torso and Head give elevation cues for lower frequencies. Source Left ear HEAD Intricate system to be completely modelled Right ear 4

It’s head, torso, and pinna 5

It’s head, torso, and pinna 5

Head Related Transfer Function(HRTF) § § § Spectral filtering caused by the head, torso,

Head Related Transfer Function(HRTF) § § § Spectral filtering caused by the head, torso, and the pinna. HRIR—Head related impulse response. Can experimentally measure HRIR for all elevation and azimuth. Convolve the source signal with the measured HRIR to create virtual audio 6

Sample HRIR and HRTF Source directly in front of your right ear. 7

Sample HRIR and HRTF Source directly in front of your right ear. 7

CIPIC Database § Public Domain HRIR Database § HRIRs sampled at 1250 points around

CIPIC Database § Public Domain HRIR Database § HRIRs sampled at 1250 points around the head § 45 subjects § Anthropometry measurements V. Ralph Algazi, Richard O. Duda, Dennis M. Thompson, Carlos Avendano, "The CIPIC HRTF database, "in WASSAP '01 (2001 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, Oct. 8 2001).

Interaural polar coordinate system Azimuth Elevation 9

Interaural polar coordinate system Azimuth Elevation 9

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 10

Manifold representation If we can unfold this low-dimensional manifold we have a good perceptual

Manifold representation If we can unfold this low-dimensional manifold we have a good perceptual representation of the signal. 11

Our data matrix d Elevation manifold 50 points in a [HRTF=257 HRIR=200] dimensional space

Our data matrix d Elevation manifold 50 points in a [HRTF=257 HRIR=200] dimensional space 200 x 50 257 x 50 13

Dimensionality Reduction methods • We used to following four methods – Principal Component Analysis

Dimensionality Reduction methods • We used to following four methods – Principal Component Analysis (PCA) – Local Linear Embedding (LLE) – Isomap – Maximum Variance Unfolding (MVU) • We expect – The manifold to have an intrinsic dimensionality of 1. – The first embedded component to be monotonic with elevation. 14

HRTF elevation manifold PCA 15

HRTF elevation manifold PCA 15

HRTF manifold Isomap (K=3) 16

HRTF manifold Isomap (K=3) 16

HRTF manifold Isomap (K=2) 17

HRTF manifold Isomap (K=2) 17

HRTF manifold LLE (K=3) 18

HRTF manifold LLE (K=3) 18

HRTF manifold LLE (K=2) 19

HRTF manifold LLE (K=2) 19

HRTF manifold MVU 20

HRTF manifold MVU 20

HRTF manifold MVU 21

HRTF manifold MVU 21

HRIR elevation manifold PCA Isomap LLE MVU 22

HRIR elevation manifold PCA Isomap LLE MVU 22

Complete manifold Azimuth -45: 5: 45 Elevation -45: 5: 230 • We expect –

Complete manifold Azimuth -45: 5: 45 Elevation -45: 5: 230 • We expect – The manifold to have an intrinsic dimensionality of 2. – The first two embedded components should show a grid like structure. 23

Complete manifold PCA 24

Complete manifold PCA 24

Complete manifold LLE (K=4) 25

Complete manifold LLE (K=4) 25

Complete manifold LMVU (K=4) 26

Complete manifold LMVU (K=4) 26

Isomap (K=4) 27

Isomap (K=4) 27

HRIR manifold PCA Isomap Data representation -- manifold properties LLE, MVU - numerical problems

HRIR manifold PCA Isomap Data representation -- manifold properties LLE, MVU - numerical problems 28

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies

Plan of the talk • Human spatial hearing • Perceptual manifolds • Exploratory studies • Applications 29

Problem 1: Interpolation • HRTFs generally measured for a finite sampling grid of elevation

Problem 1: Interpolation • HRTFs generally measured for a finite sampling grid of elevation and azimuth. • For a smooth virtual audio system we need to interpolate HRTFs. • HRTF measurement is a tedious and time consuming process. – Normally takes an hour. – Subject must be immobile. 30

Some prelimnary results 31

Some prelimnary results 31

Problem 2: Distance metric • • • How to compare any two given HRTFs

Problem 2: Distance metric • • • How to compare any two given HRTFs Perceptually inspired metric Psychoacoustical tests Squared log-magnitude error It is tough to decide what aspects of a given signal are perceptually relevant • Use geodesic distance 32

Distance on the manifold 33

Distance on the manifold 33

Problem 3: Customization • HRTF measured for a particular person if used for different

Problem 3: Customization • HRTF measured for a particular person if used for different persons elevation perception is very poor. • Ear shape of each person is unique and also the anatomy. • Each person’s localizing capabilites are tuned to the shape of their ear and anatomy. • A big bottleneck for commercialization of spatial audio. 34

Style vs Content 35

Style vs Content 35

Anthropometric measurements 36

Anthropometric measurements 36

Problem 4: Microphone calibration 37

Problem 4: Microphone calibration 37

Thank You ! | Questions ? 38

Thank You ! | Questions ? 38