Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation
Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly en...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Acoustics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2624-599X/7/2/30 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850157847430561792 |
|---|---|
| author | Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei |
| author_facet | Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei |
| author_sort | Federico Cioffi |
| collection | DOAJ |
| description | Speech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and reduce cognitive effort. In an ever-growing diffusion of virtual environments, communicating through virtual avatars is becoming increasingly prevalent, thus requiring a comprehensive understanding of these dynamics to ensure effective interactions. The present study used Unreal Engine’s MetaHuman technology to compare four methodologies used to create facial animation: MetaHuman Animator (MHA), MetaHuman LiveLink (MHLL), Audio-Driven MetaHuman (ADMH), and Synthetized Audio-Driven MetaHuman (SADMH). Thirty-six word pairs from the Diagnostic Rhyme Test (DRT) were used as input stimuli to create the animations and to compare them in terms of intelligibility. Moreover, to simulate a challenging background noise, the animations were mixed with a babble noise at a signal-to-noise ratio of −13 dB (A). Participants assessed a total of 144 facial animations. Results showed the ADMH condition to be the most intelligible among the methodologies used, probably due to enhanced clarity and consistency in the generated facial animations, while eliminating distractions like micro-expressions and natural variations in human articulation. |
| format | Article |
| id | doaj-art-e0c350596d0a4d87bed3567eccc5b6cb |
| institution | OA Journals |
| issn | 2624-599X |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Acoustics |
| spelling | doaj-art-e0c350596d0a4d87bed3567eccc5b6cb2025-08-20T02:24:03ZengMDPI AGAcoustics2624-599X2025-05-01723010.3390/acoustics7020030Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial AnimationFederico Cioffi0Massimiliano Masullo1Aniello Pascale2Luigi Maffei3Department of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalyDepartment of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalyImmensive s.r.l.s., 81030 Parete, CE, ItalyDepartment of Architecture and Industrial Design, Università degli Studi della Campania “Luigi Vanvitelli”, 81031 Aversa, CE, ItalySpeech intelligibility (SI) is critical in effective communication across various settings, although it is often compromised by adverse acoustic conditions. In noisy environments, visual cues such as lip movements and facial expressions, when congruent with auditory information, can significantly enhance speech perception and reduce cognitive effort. In an ever-growing diffusion of virtual environments, communicating through virtual avatars is becoming increasingly prevalent, thus requiring a comprehensive understanding of these dynamics to ensure effective interactions. The present study used Unreal Engine’s MetaHuman technology to compare four methodologies used to create facial animation: MetaHuman Animator (MHA), MetaHuman LiveLink (MHLL), Audio-Driven MetaHuman (ADMH), and Synthetized Audio-Driven MetaHuman (SADMH). Thirty-six word pairs from the Diagnostic Rhyme Test (DRT) were used as input stimuli to create the animations and to compare them in terms of intelligibility. Moreover, to simulate a challenging background noise, the animations were mixed with a babble noise at a signal-to-noise ratio of −13 dB (A). Participants assessed a total of 144 facial animations. Results showed the ADMH condition to be the most intelligible among the methodologies used, probably due to enhanced clarity and consistency in the generated facial animations, while eliminating distractions like micro-expressions and natural variations in human articulation.https://www.mdpi.com/2624-599X/7/2/30virtual realityavatarfacial animationunreal engineMetaHumanspeech intelligibility |
| spellingShingle | Federico Cioffi Massimiliano Masullo Aniello Pascale Luigi Maffei Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation Acoustics virtual reality avatar facial animation unreal engine MetaHuman speech intelligibility |
| title | Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation |
| title_full | Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation |
| title_fullStr | Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation |
| title_full_unstemmed | Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation |
| title_short | Speech Intelligibility in Virtual Avatars: Comparison Between Audio and Audio–Visual-Driven Facial Animation |
| title_sort | speech intelligibility in virtual avatars comparison between audio and audio visual driven facial animation |
| topic | virtual reality avatar facial animation unreal engine MetaHuman speech intelligibility |
| url | https://www.mdpi.com/2624-599X/7/2/30 |
| work_keys_str_mv | AT federicocioffi speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT massimilianomasullo speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT aniellopascale speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation AT luigimaffei speechintelligibilityinvirtualavatarscomparisonbetweenaudioandaudiovisualdrivenfacialanimation |