Subjective Sound Quality Assessment of Mobile Phones for

Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 8 th and 9 th June 2004 - Mainz, Germany

Introduction The goal of the tests presented in this talk is to ensure customer acceptance of audio quality by statistically approved data. n Customers rate the sum of § Echo cancellation, noise reduction, automatic gain control, … Introduction Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability n Contradicting to ancillary conditions of: § Short time (No waste of production capacities) § Low cost n Only limited correlation of objective measurements and subjective sound perception. First Test Presentation n Execute subjective audio quality tests before the release for unrestricted serial production Overall Quality Most Annoying Properties n Former results often not reliable due to friendly users and too few tests to guarantee statistical approval Discussion & Outlook Subjective Audio Quality Assessment, 5/25/2021 2 © Siemens, 2004

Presentation Outline n Test Design Introduction § Laboratory or in-situ tests? § Laboratory test design Presentation Outline § Conversational task Test Design § Statistical reliability Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability n First Test Presentation § Overall Quality § Most Annoying Properties First Test Presentation n Discussion & Outlook Overall Quality Most Annoying Properties Discussion & Outlook Subjective Audio Quality Assessment, 5/25/2021 3 © Siemens, 2004

Test Design Typical conversation situations for a mobile phone Introduction n Single Talk Presentation Outline n Double talk Test Design Two different test subject groups Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability n Naive users n Expert Users Different recommended test methods First Test Presentation n Absolute category rating Overall Quality Most Annoying Properties n Comparative category rating n Degraduating category rating Discussion & Outlook n Threshold Method n Quantal-response detectability tests Subjective Audio Quality Assessment, 5/25/2021 4 © Siemens, 2004

Test Design (ctd. ) Introduction Naive user tests Absolute category rating of overall quality and collecting most annoying properties. Trained user tests Comparative category rating of different parameter sets on most annoying properties (in parallel further parameter alteration) no Unrestricted Serial production Discussion & Outlook Naive user tests will be carried out as single talk and double talk. 5 Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability Overall Quality Most Annoying Properties yes Subjective Audio Quality Assessment, 5/25/2021 Test Design First Test Presentation Satisfying results? Evaluation Presentation Outline © Siemens, 2004

Laboratory or in-situ tests? in-situ + Introduction Laboratory + Good controlling + Small effort More interesting for test persons + Reproducible conditions - Large effort + Easy control of environmental conditions - Difficult controlling - - Time intensive Some effects have to be neglected + Nothing is more real than reality - Presentation Outline Psychological influence of laboratory environment on test results Laboratory tests are much more cost-effective than in-situ tests. n But: How close can reality be rebuilt in laboratories? n There should be at least one comparison between laboratory and in-situ. 6 Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook n Subjective Audio Quality Assessment, 5/25/2021 Test Design © Siemens, 2004

Laboratory test design Terminal A: fixed network, hand held, specified, silent office environment (e. g. according to ITU-T P. 800) Terminal B: mobile or carkit under test Introduction Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability Car Noise First Test Presentation Babble Noise Silence Reproducible playback of previously recorded environmental noises as diffuse sound field n Single and double talk tests are carried out using different noise levels n Roles within the tests are interchanged n Rating interview with both test subjects Subjective Audio Quality Assessment, 5/25/2021 7 © Siemens, 2004 Overall Quality Most Annoying Properties Discussion & Outlook

Conversational Tasks Properties of short conversation test scenarios (SCTs) n Typical conversation tasks caller called person Greeting § Ordering pizza § Booking a flight n SCTs are judged as natural by test subjects Question Precision Offer Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability Order First Test Presentation Information Overall Quality Most Annoying Properties Treating of Order Discussion of open question Farewell [S. Möller, 2000] Subjective Audio Quality Assessment, 5/25/2021 Presentation Outline Test Design Enquiry n Conversation lasts about 2 ½ min § Extended to about 4 min by following interview Introduction Formal structure 8 © Siemens, 2004 Discussion & Outlook

Statistical Reliability n Moments of interest are the mean and the error of the mean Introduction n Error of the mean is a function of the standard deviation Presentation Outline n Worst case approximation: § Error of the mean is maximised if supreme and inferior ratings are given with relative frequency of 50% § An error of the mean accounting less than 10 % of the rating interval width is guaranteed after 30 tests of 4 min each, resulting in an overall test duration of 2 hours Tests with 3 different background noises at 3 different levels and in silent environment can be carried out in 40 h (1 week) over 2 different networks Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook Subjective Audio Quality Assessment, 5/25/2021 9 © Siemens, 2004

First Test Presentation n Internal fair at the beginning of May Introduction n Non representative, just “testing the test“ n Background: babble noise ~70 d. B(A) Presentation Outline n Terminal under test: Test Design § Known to be too silent (not known by test subjects and experimenter) § Development concluded n interview only for the mobile terminal user (19 subjects) n Naive user tests with two questions § What is your opinion of the overall quality of the connection you have just been using? § What were the most annoying properties of the connection you have just been using? n Results given as § Numbers on a scale from 0 to 120 § Predefined answers without technical terms (adding new ones was possible) Subjective Audio Quality Assessment, 5/25/2021 10 © Siemens, 2004 Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook

Overall Quality Bad Poor Fair Good Excellent 120 n Numbers invisible for test subjects n Average overall rating: 74 ± 4 § (62 ± 3)% of rating interval width n Start value 60 with highest relative frequency n To compare the internal scale with standard MOS ratings, a normalisation is required Subjective Audio Quality Assessment, 5/25/2021 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Rating TS 0 Introduction 38 103 95 60 60 82 81 60 67 72 90 74 103 73 93 38 60 82 78 © Siemens, 2004 Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook

Overall Quality 1 Poor 2 Fair 3 Good Excellent 4 120 5 n MOSc: MOS rating intervals with scale labels in the center § Extreme value 5 rated 5 times (>25 %) § Extreme value 1 never assigned n Average overall rating: 3. 8 ± 0. 2 § (70 ± 5)% of rating interval width Subjective Audio Quality Assessment, 5/25/2021 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MOSc Bad Rating TS 0 38 103 95 60 60 82 81 60 67 72 90 74 103 73 93 38 60 82 78 Introduction 2 5 5 3 3 4 4 3 3 4 5 4 5 2 3 4 4 © Siemens, 2004 Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook

Overall Quality Poor 1 Fair 2 Good Excellent 3 4 120 5 n MOSl: MOS rating intervals with scale labels at the lower end § Complete range is used § Extreme value 5 rated twice n Average overall rating: 3. 3 ± 0. 2 § (58 ± 5)% of rating interval width Subjective Audio Quality Assessment, 5/25/2021 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 MOSl Bad Rating TS 0 38 103 95 60 60 82 81 60 67 72 90 74 103 73 93 38 60 82 78 Introduction 1 5 4 3 3 4 4 3 3 3 4 3 5 3 4 1 3 4 3 © Siemens, 2004 Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation Overall Quality Most Annoying Properties Discussion & Outlook

Most Annoying Properties My partner‘s voice was too silent 9 Loud noise during the call 8 Introduction I heard my own voice as echo 1 Presentation Outline My partner‘s voice was reverberant 1 Test Design My partner‘s voice sounded robotic 1 I heard artificial sounds 1 *My partner‘s voice sounded modulated 1 *My partners voice was too deep 1 Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation I heard my partner‘s voice as echo Overall Quality Most Annoying Properties My partner‘s voice was too loud *) Properties added during test n About 50% of test subjects regarded the partner‘s voice as too silent (known before, but not by the subjects and the experimenter) n 7 of 8 test subjects regarded the environmental noise as annoying property Subjective Audio Quality Assessment, 5/25/2021 14 © Siemens, 2004 Discussion & Outlook

Discussion & Outlook A short-time intensive subjective test method and a first test were presented. n After ratings of 19 test subjects § the error of the mean overall quality was assessed to about 3 % of rating interval width § statistical approval of being too silent n Questions and predefined answers have to be chosen very carefully n Scale rating normalisation to MOS is a non trivial problem Introduction Presentation Outline Test Design Laboratory or insitu tests? Laboratory test design Conversational task Statistical Reliability First Test Presentation n Next steps: Overall Quality Most Annoying Properties § Comparison of laboratory and in-situ tests § Tests of terminals and car kits currently in development state. Discussion & Outlook Subjective Audio Quality Assessment, 5/25/2021 15 © Siemens, 2004