13th Speech in Noise Workshop, 20-21 January 2022, Virtual Conference 13th Speech in Noise Workshop, 20-21 January 2022, Virtual Conference

P24 Deep neural networks for speech enhancement in noise

Lars Bramsløw
Eriksholm Research Centre, Oticon A/S, Denmark

Gaurav Naithani, Tuomas Virtanen
Computing Sciences, Tampere University, Finland

(a) Presenting

Deep neural networks (DNN) have demonstrated substantial user benefits for speech-in-noise enhancement for hearing-impaired listeners and voice-on-voice enhancement. In a previous study [Bramsløw et al., 2018, JASA 144(1):172-185, doi:10.1121/1.5045322], two known talkers were separated, using different types of low-latency DNN algorithms. A 37%-point improvement in word recognition in 15 hearing-impaired listeners was shown when selecting one voice, and a 13%-point improvement in the same listeners when presenting the two separated talkers dichotically to the two ears. The present work uses similar DNN architectures for enhancing speech from more common noise types: a party noise and a shopping centre ambient noise. Speech from a 12-talker Danish HINT corpus was used for training; two male and two female talkers from the same corpus were used in the listening tests, recording the word recognition scores from 21 hearing-impaired listeners. Five different low-latency DNN architectures with time-frequency masking were tested, employing both talker-dependent and talker-independent training. In party noise, a statistically significant word -recognition improvement of 16%-points was found for a talker-dependent and a talker-independent DNN type, while two other DNN types showed a statistically significant improvement of 12%-points. In the more stationary shopping centre noise, no improvements were found. It was assumed that the more modulated party noise provided more glimpsing opportunities for the DNN algorithm compared to the shopping centre noise.

Last modified 2022-01-24 16:11:02