15th Speech in Noise Workshop, 11-12 January 2024, Potsdam, Germany 15th Speech in Noise Workshop, 11-12 January 2024, Potsdam, Germany

P49Session 1 (Thursday 11 January 2024, 15:35-18:00)
Speech-in-noise pupillometry data collected via a virtual reality headset

Tim Green
Speech Hearing & Phonetic Sciences, UCL, London, UK

Lorenzo Picinali, Tim Murray-Browne, Isaac Engel, Craig Henry
Dyson School of Design Engineering, Imperial College, London, UK

In an initial exploration of virtual reality (VR) for assessing speech recognition and listening effort in realistic environments, pupillometry data collected via a consumer Vive Pro-Eye headset were compared with data obtained via an Eyelink 1000 system, from the same normally-hearing participants presented with similar non-spatialized auditory stimuli. Vive testing used a custom platform based on Unity for video playback and MaxMSP for headphones-based audio rendering and overall control. For Eyelink measurements, head movements were minimized by a chin rest and participants fixated on a cross presented on a grey screen. No such constraints were present for Vive measurements which were conducted both using a 360° cafe video and with a uniformly grey visual field. Participants identified IEEE sentences in babble at two signal-to-noise ratios (SNRs). The lower SNR (-3 dB) produced around 65% word recognition, while performance for the higher SNR (+6 dB) was at ceiling. As expected, pupil dilation was greater for the lower SNR, indicating increased listening effort. Averaged across participants, pupil data were very similar across systems, and for the Vive, across the different visual backgrounds, thereby supporting the idea that VR systems have the potential to provide useful pupillometric listening effort data in realistic environments.

Ongoing experiments, conducted solely in VR and using spatialised sound, examine the effects of incorporating different aspects of realistic situations. In the aforementioned café video three computer monitors, each associated with a particular talker, are visible on tables. In one condition, as above, target sentences from a single talker are presented audio-only. In a second single-talker condition, a video of the talker reading the target sentence is also presented. In a third condition the target talker varies quasi-randomly across trials. A visual cue presented 2s prior to the target video allows the participant to orient to the correct screen. In each visual condition, target sentences are presented against a background of café noise plus additional babble at two SNRs, differing by 9 dB. Comparison across conditions will be informative of the extent to which pupillometry data may be affected by factors such as dynamic changes in visual background and head movements of the type typically involved in multi-talker conversations.

Funding: Supported by the William Demant Foundation.

Last modified 2024-01-16 10:49:05