13th Speech in Noise Workshop, 20-21 January 2022, Virtual Conference 13th Speech in Noise Workshop, 20-21 January 2022, Virtual Conference

T02 Text-to-speech and back — new ways in speech audiometry

Inga Holube, Saskia Ibelings
Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, DE

Jasper Ooster
Communication Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University, Oldenburg, DE

Theresa Nuesse
Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, DE

(a) Presenting
(b) Attending

Speech audiometry involves the use of speech recognition tests. In an open set format, sentences are presented via headphones or loudspeakers and the correct repetitions by the listener are noted by the examiner. Developing and administering speech tests is a complex and time-consuming process. Speech material must be recorded, adjusted, optimized, and evaluated with large numbers of normal-hearing listeners. New technologies like text-to-speech synthesis (TTS) and automatic speech recognition (ASR) make it possible to support this process. The quality of TTS is sufficient that it can replace recordings with natural speakers. The results of speech tests for normal-hearing listeners using synthesized matrix sentences or everyday sentences, are comparable to the results of tests with their natural speech version. Advances in TTS will enable easier development of larger speech corpora for existing speech tests or new speech material. Further, during the current pandemic, the need for online testing and maintaining greater distances between listeners and examiners in the testing room has increased. While matrix tests can be administered in a closed-set format with all response alternatives displayed on the screen, everyday sentences still require an examiner. An alternative is to record the responses and have them evaluated by ASR. This technique additionally allows calculation of verbal response time, i.e., the time between the end of the presentation and the beginning of the listener’s response, as a measure of listening effort. This presentation will give examples of the application of both technologies.

Last modified 2022-01-24 16:11:02