Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation

Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly en...

Full description

Saved in:

Bibliographic Details
Main Authors:	Federico Cioffi, Massimiliano Masullo, Aniello Pascale, Luigi Maffei
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Acoustics
Subjects:	virtual reality avatar facial animation unreal engine MetaHuman speech intelligibility
Online Access:	https://www.mdpi.com/2624-599X/7/2/30
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850157847430561792
author	Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei
author_facet	Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei
author_sort	Federico Cioffi
collection	DOAJ
description	Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and reduce cognitive effort. In an ever-growing diffusion of virtual environments, communicating through virtual avatars is becoming increasingly prevalent, thus requiring a comprehensive understanding of these dynamics to ensure effective interactions. The present study used Unreal Engine’s MetaHuman technology to compare four methodologies used to create facial animation: MetaHuman Animator (MHA), MetaHuman LiveLink (MHLL), Audio-Driven MetaHuman (ADMH), and Synthetized Audio-Driven MetaHuman (SADMH). Thirty-six word pairs from the Diagnostic Rhyme Test (DRT) were used as input stimuli to create the animations and to compare them in terms of intelligibility. Moreover, to simulate a challenging background noise, the animations were mixed with a babble noise at a signal-to-noise ratio of −13 dB (A). Participants assessed a total of 144 facial animations. Results showed the ADMH condition to be the most intelligible among the methodologies used, probably due to enhanced clarity and consistency in the generated facial animations, while eliminating distractions like micro-expressions and natural variations in human articulation.
format	Article
id	doaj-art-e0c350596d0a4d87bed3567eccc5b6cb
institution	OA Journals
issn	2624-599X
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Acoustics
spelling	doaj-art-e0c350596d0a4d87bed3567eccc5b6cb2025-08-20T02:24:03ZengMDPI AGAcoustics2624-599X2025-05-01723010.3390/acoustics7020030Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial AnimationFederico Cioffi0Massimiliano Masullo1Aniello Pascale2Luigi Maffei3Department of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalyDepartment of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalyImmensive s.r.l.s., 81030 Parete, CE, ItalyDepartment of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalySpeech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and reduce cognitive effort. In an ever-growing diffusion of virtual environments, communicating through virtual avatars is becoming increasingly prevalent, thus requiring a comprehensive understanding of these dynamics to ensure effective interactions. The present study used Unreal Engine’s MetaHuman technology to compare four methodologies used to create facial animation: MetaHuman Animator (MHA), MetaHuman LiveLink (MHLL), Audio-Driven MetaHuman (ADMH), and Synthetized Audio-Driven MetaHuman (SADMH). Thirty-six word pairs from the Diagnostic Rhyme Test (DRT) were used as input stimuli to create the animations and to compare them in terms of intelligibility. Moreover, to simulate a challenging background noise, the animations were mixed with a babble noise at a signal-to-noise ratio of −13 dB (A). Participants assessed a total of 144 facial animations. Results showed the ADMH condition to be the most intelligible among the methodologies used, probably due to enhanced clarity and consistency in the generated facial animations, while eliminating distractions like micro-expressions and natural variations in human articulation.https://www.mdpi.com/2624-599X/7/2/30virtual realityavatarfacial animationunreal engineMetaHumanspeech intelligibility
spellingShingle	Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation Acoustics virtual reality avatar facial animation unreal engine MetaHuman speech intelligibility
title	Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
title_full	Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
title_fullStr	Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
title_full_unstemmed	Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
title_short	Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
title_sort	speech intelligibility in virtual avatars comparison between audio and audio visual driven facial animation
topic	virtual reality avatar facial animation unreal engine MetaHuman speech intelligibility
url	https://www.mdpi.com/2624-599X/7/2/30
work_keys_str_mv	AT federicocioffi speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT massimilianomasullo speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT aniellopascale speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT luigimaffei speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation

Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation

Similar Items