AES 144 Convention Paper 10018 Acoustic and Subjective

AES 144 Convention Paper 10018 Acoustic and Subjective Evaluation of 22. 2 - and 2 Channel Reproduced Sound Fields in Three Studios Madhu 1 Ashok , Richard 2 King , Toru 3 Kamekawa , Sungyoung 4 Kim 1 Electrical and Computer Engineering, University of Rochester, 500 Joseph C. Wilson Blvd. , Rochester NY, 14627, USA. 2 Sound Recording Area, Schulich School of Music, Mc. Gill University, Montreal, Canada. 3 Music Creativity and the Environment, Tokyo University of the Arts, Tokyo, Japan. 4 Electrical, Telecommunication, and Computer Engineering Technology, RIT, 1 Lomb Memorial Dr. , Rochester NY, 14623, USA. GOAL Results Studios Will 22. 2 -channel playback reduce the listeners’ ability to infer information about the physical acoustic properties or dimensions of the room? Theoretical Foundation Identical acoustic models were generated for each playback format and studio combination, to verify if the same perceptive patterns were present in the computer generated impulse responses. Low frequency characteristics of the room were compared using MATLAB simulations, since CATT-Acoustic mostly ignores wave effects of interference and diffraction. The resonant modes and mode densities are calculated [1] for each studio using the outer dimensions. Room Width (m) Length (m) Height (m) Volume (m 3) RT 60 1 k. Hz (ms) Canada (Mc. Gill University) - CIRMMT A 820 6 m 7. 8 m 3. 2 m 149. 8 m 3 168 ms USA (Rochester Institute of Tech. - RIT) - Conference Room 7 m 5. 9 m 3. 4 m 140. 4 m 3 640 ms Japan (Tokyo Univ. of the Arts - Geidai) - Studio B 6. 8 m 4. 5 m 208. 1 m 3 340 ms The figures below show the perceptual and subject space of six simulated binaural stimuli. In the perceptual space, the distances between points were calculated from the input ratings of the inter-stimulus dissimilarities. The first character of each symbol represents one of three rooms and following number indicates the reproduction format. For example, G 22 indicates the binaural simulation of the 22 -channel reproduced audio in the Geidai. The simulated results reveal that the listeners perceived the stimuli differences through two factors–reproduction format (dimension 1) and room influence (dimension 2). Subjective Evaluation of Simulations Geidai studio with Brüel & Kjær 4100 D dummy head (left) and corresponding CATT-Acoustic Simulation with 22 genelec speakers(right) Three studios of similar outer-shell dimensions, with varying acoustic treatments and absorptivity, were evaluated via both recorded and simulated binaural stimuli for 22. 2 - and 2 -channel playback. A series of analyses, including acoustic modelling in CATT-Acoustic and subjective evaluation, was conducted to test whether the 22. 2 -channel playback preserved common perceptual impressions regardless of room-dependent physical characteristics. Results from multidimensional scaling (MDS) indicated that listeners used one perceptual dimension for differentiating between reproduction format, and others for physical room characteristics. Clarity and early decay time measured in the three studios illustrated a similar pattern when scaled from 2 - to 22. 2 -channel reproduced sound fields. Subjective evaluation revealed a tendency to preserve inherent perceptual characteristics of 22. 2 -channel playback in spite of different playback conditions. Subjective Evaluation of Recordings Subjective Evaluation Clarity, or C 80, is a metric used commonly for characterizing the ability to render speech. Clarity is found by taking the ratio of the energy decay in the first 80 milliseconds to the energy from the remaining impulse: Clarity (C 80) Calculated from CATT-Acoustic Simulations C 80 1 k. Hz (d. B): Mc. Gill RIT Geidai 2 Channel 38. 99 9. 8 24. 39 22. 2 Channel 46. 06 18. 79 29. 84 C 80 2 k. Hz (d. B): Mc. Gill RIT Geidai 2 Channel 37. 83 12. 39 24. 14 22. 2 Channel 48. 02 21. 01 33. 30 C 80 4 k. Hz (d. B): Mc. Gill RIT Geidai 2 Channel 40. 59 14. 89 27. 69 22. 2 Channel 51. 19 24. 23 36. 38 In the evaluation, each listener compared two randomly selected binaural stimuli and judged how different two stimuli were. Then the person quantified the judged difference as a number ranging from 0 (most similar) to 100 (most dissimilar). The generated numbers were input for multidimensional scaling (MDS) analysis. When MDS is performed for multiple subjects, it is important to account for perceptual differences associated with each subject. To form judgment of global dissimilarity of a group of subjects, the INDividual Differential SCALing (INDSCAL) is commonly used. INDSCAL analysis generates two outputs: a dissimilarity based perceptual space for the stimuli and a subject space. The subject space illustrates how each subject placed weight on a dimension of the extracted perceptual space. Mc. Gill Studio: 22. 2 -channel playback RIT Studio: 22. 2 -channel playback Geidai Studio: 22. 2 -channel playback (1) (2) (3) (4) (5) (6) Total 11 subjects participated in the subjective evaluation and the collected dissimilarity ratings were submitted to INDSCAL in the SPSS software package [2] for analysis. Inter-stimuli Distance 2 Channel 22. 2 Channel CATT-A Simulation 5. 2862 5. 0486 In-situ Recordings 5. 7361 4. 5523 The sum of inter-stimuli distances among the three studios was found to decrease in both simulated and recorded evaluations, indicating a reduction in room differentiation for 22. 2 -channel content. References [1] Walker, R. "Room modes and low-frequency responses in small enclosures. " Audio Engineering Society Convention 100. Audio Engineering Society, (1996). [2] IBM. SPSS 24 User's Guide. Web. 30 May 2017. <https: //www. ibm. com/support/knowledgecenter/en/SSLVMB_24. 0. 0/spss/base/ syn_alscal_overview. html>.

Slides: 1

Download presentation