MultiShift Principal Component Analysis based Primary Component Extraction
Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, Woon-Seng Gan jhe 007@e. ntu. edu. sg, ewsgan@ntu. edu. sg 22 nd April 2015 Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
WHY To obtain a new representation of sound scenes in digital media, which is both flexible and efficient in spatial audio reproduction for any playback systems. Existing sound scene representations: v Channel-based ü Conventional, for a specific playback system; q Lacks the flexibility to support different playback configurations. v Object-based ü Emerging, for any playback system; q Lacks the efficiency: large storage and high transmission bandwidth. Primary-ambient based representation ü Inspired by human auditory system; ü Facilitates flexible and efficient rendering. Ø Primary-ambient extraction (PAE) from the channel-based audio (e. g. , stereo). ü Existing approaches: mainly for one dominant source in primary components; ü Subband techniques: problematic for overlapping spectra; q PAE with multiple sources (different directions) not well studied. 2
Stereo Signal Model Signal = Primary + Ambient Assumptions Primary components highly correlated Ambient components uncorrelated Primary ambient components uncorrelated Ambient power balanced k : Primary panning factor J. He, E. L. Tan and W. S. Gan, “Linear estimation based primary-ambient extraction for stereo audio signals, ” IEEE Trans. Audio, Speech, Lang. Process. , vol. 22, no. 2, pp. 505 -517, Feb. 2014. 3
PCA for primary extraction Objective Ambient basis Primary basis M. M. Goodwin and J. M. Jot, “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement, ” in Proc. ICASSP, Hawaii, 2007, pp. 9 -12. 4
Shifted PCA for primary extraction To account for the partial primary correlation (0 -lag) caused by the inter-channel time difference (ICTD). Shifted signal Shifted primary J. He, E. L. Tan, and W. S. Gan, “Time-shifted principal component analysis based cue extraction for stereo audio signals, ” in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266 -270. 5
Multi-Shift PCA for primary extraction To account for concurrent directional sound sources (from different directions) in the primary components, we consider a few selective shifts. Typical structure of MSPCA (MSPCA-T) 6
Multi-Shift PCA: consecutive structure Consecutive shifting lag by lag, and apply different weights to different shifted versions. The weights are derived based on inter-channel crosscorrelation coefficient (ICC). 7
Experiment setup Ø Primary components: • speech: • music: Ø Ambient components: uncorrelated white Gaussian noise; Ø Overall power of speech, music and ambience are set equal; Ø Approaches evaluated: ü PCA ü SPCA ü MSPCA-T ü MSPCA (a=2) ü MSPCA (a=10) Ø ICTD searching range: ± 50 lags, (~2 ms for fs=44. 1 k. Hz) 8
Comparison of weighting methods Ø PCA and SPCA: only one nonzero weight at different lags; Ø MSPCA-T: two weights at two lags, though the positive ICTD for the music is not as accurate; Ø For consecutive MSPCA, non-zero weights at all lags, and higher weights are given to those lags that are closer to the directions of the primary components; Ø As the exponent a increases, the differences among the weights become more significant; Ø When a is high (e. g. , a = 10), the weighting method in consecutive MSPCA becomes similar to SPCA. 9
Objective performance: extraction accuracy Error-to-signal ratio 10
Subjective performance: localization accuracy 12 participants, score from 0 -10 Ø Ø Ø 0 -2 : two directions almost reversed; 2 -4 : neither directions are close; 4 -6 : neither directions are close nor too far; 6 -8 : at least one direction is close; 8 -10: both directions are close to reference; 11
Conclusions 1. Investigated primary extraction from stereo signals when there are multiple concurrent distinct directions for the sources in the primary components. 2. Proposed multi-shift PCA to handle multiple directions a) MSPCA with typical structure involves limited selected shifts, but its performance is degraded when ICTD estimation is inaccurate; b) MSPCA with consecutive structure is more robust, by applying weights on every shifted versions. c) The weighting method for different shifts is critical; d) In general, applying a proper exponent of the ICC yields good (objective and subjective) performance. 3. Future work: determine the best exponent value for ICC based weighting, other weighting methods, and relate multi-shifting with optimal filtering in PAE. 12
References [1] M. M. Goodwin and J. M. Jot, “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement, ” in Proc. ICASSP, Hawaii, 2007, pp. 9 -12. [7] C. Faller and F. Baumgarte, “Binaural cue coding-part II: schemes and applications, ” IEEE Trans. Speech Audio Process. , vol. 11, no. 6, pp. 520 -531, Nov. 2003. [8] M. M. Goodwin and J. M. Jot, “Binaural 3 -D audio rendering based on spatial audio scene coding, ” in Proc. 123 rd Audio Eng. Soc. Conv. , New York, 2007. [12] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones, ” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 100 -113, Mar. 2015. [13] C. Avendano and J. M. Jot, “A frequency-domain approach to multichannel upmix, ” J. Audio Eng. Soc. , vol. 52, no. 7/8, pp. 740 -749, Jul. /Aug. 2004. [14] C. Faller, “Multiple-loudspeaker playback of stereo signals, ” J. Audio Eng. Soc. , vol. 54, no. 11, pp. 1051 -1064, Nov. 2006. [17] J. He, W. S. Gan, and E. L. Tan, “Primary-ambient extraction using ambient phase estimation with a sparsity constraint, ” IEEE Signal Process. Letters, vol. 22, no. 8, pp. 1127 -1131, Aug. 2015. [18] J. Merimaa, M. M. Goodwin, and J. M. Jot, “Correlation-based ambience extraction from stereo recordings, ” in Proc. 123 rd Audio Eng. Soc. Conv. , New York, 2007. [21] J. He, E. L. Tan, and W. S. Gan, “Time-shifted principal component analysis based cue extraction for stereo audio signals, ” in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266 -270. [22] J. He, E. L. Tan and W. S. Gan, “Linear estimation based primary-ambient extraction for stereo audio signals, ” IEEE Trans. Audio, Speech, Lang. Process. , vol. 22, no. 2, pp. 505 -517, Feb. 2014. [24] J. He, W. S. Gan and E. L. Tan, “A study on the frequency-domain primary-ambient extraction for stereo audio signals, ” in Proc. ICASSP, Florence, Italy, 2014, pp. 2868 -2872. 13
Acknowledgement THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND TIER-2, UNDER RESEARCH GRANT MOE 2010 -T 2 -2 -040. 14
Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Thank you! Jianjun HE, Woon-Seng Gan jhe 007@e. ntu. edu. sg, ewsgan@ntu. edu. sg Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
- Slides: 15