P52 Effect of audiovisual lag on speech-in-noise comprehension in cochlear implant users and typical hearing controls
When a person is seen speaking, a listener’s ability to understand that speech is supported by both the auditory signal of the voice and complementary visual cues from mouth movements. The relative amount that these auditory and visual cues contribute to understanding multisensory speech signals varies depending on the individual’s specific sensory abilities and characteristics of the listening environment. A great deal of foundational and contemporary literature has shown that visual speech cues facilitate speech-in-noise intelligibility in listeners with typical hearing. For individuals with hearing loss as well as users of assistive hearing devices such as hearing aids and cochlear implants (CIs), understanding target speech in the presence of background noise is especially challenging. Because the auditory signal produced by a CI is fundamentally degraded compared to that conveyed by the typically developed inner ear, CI users likely rely on visual speech cues more than individuals with typical hearing. While the role of visual cues in overcoming the challenge of background noise has been previously established, less is known about how CI users use visual cues when audiovisual speech material is compromised by temporal lag, as can occur with online video calling platforms. Here we show that that CI users benefit more from visual speech cues in an online speech-in-noise listening task but do so over a smaller range of asynchronies than typical hearing controls. For sentences presented in multi-talker babble, both groups benefitted from visual speech cues though the CI group showed greater multisensory gain. Additionally, both groups’ ability to understand target speech decreased as temporal offset between the auditory and visual streams increased, with CI users showing a sharper decline. Generally, these findings are in keeping with established principles of multisensory integration including inverse effectiveness (that integration confers greater benefit when unisensory signals are ambiguous) and the existence of a temporal binding window (a range of asynchronies over which audiovisual signals are commonly bound perceptually). Taken together, these results are a first step towards characterizing the interaction between these principles and expanding this framework to populations with unique sensory experience. Given the rapid expansion of remote and hybrid schooling, careers, and even socializing in recent years, thorough understanding of the unique listening challenges of such virtual environments and their efficacy for various users is necessary and timely.