P48 Impact of masks on automatic speech recognition algorithm performance in noise
Wearing face coverings to protect against COVID-19 has become mandatory or recommended in many locations. In certain cases, recommendations have included layering face masks with a second mask or with a face shield as an extra protective measure. Previous work has shown that certain types of masks impact the acoustic signal and speech intelligibility. These effects may be even more detrimental when face coverings are worn in a noisy environment. This project investigated the effect of various types of masks with or without a face shield in quiet and background noise. To this aim, pre-recorded Harvard sentences spoken by male and female speakers from the TSP corpus were played back through a Bruel & Kjaer head and torso simulator (HATS) and recorded in a sound treated room. Six mask conditions were considered in quiet and background noise conditions with different signal-to-noise ratios (SNRs). The distinct mask conditions were: no-mask, four types of face masks (cotton, surgical, N95, and KN95 masks), and one double-layered mask (cotton and surgical masks), and these were assessed with and without a face shield over the top. The Amazon Web Service (AWS) was selected as the automatic speech recognition algorithm to transcribe speech into text. The transcribed text was scored corresponding keywords of the original sentence of the list, and the score of each sentence was normalized between 0 to 1. Normalized scores from male and female speech lists were averaged. To evaluate the impact of mask type in different levels and types of noise, two linear mixed effects models were conducted independently for the quiet and the background noise conditions. In the quiet condition, there were no significant differences across any of the face coverings compared to no mask. In the noise condition, with noise type and level held constant, all mask types except for the bare face shield were associated with significantly lower intelligibility. The layered presence of the shield was associated with significantly higher intelligibility, and this was observed to be true for most mask types. Significant two-way interactions with certain masks and noise variables suggested a complex relationship between the type of noise and type of mask. Thicker masks, such as the N95, cotton mask, and double-layered mask, may be more negatively impacted by noise levels and noise type, whereas the presence of a shield may actually create a resonance effect that facilitates intelligibility in some cases.