Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans

In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness,...

Full description

Saved in:
Bibliographic Details
Main Authors: Ghazanfar Ali, Woojoo Kim, Muhammad Shahid Anwar, Jae-In Hwang, Ahyoung Choi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11115060/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849223100647866368
author Ghazanfar Ali
Woojoo Kim
Muhammad Shahid Anwar
Jae-In Hwang
Ahyoung Choi
author_facet Ghazanfar Ali
Woojoo Kim
Muhammad Shahid Anwar
Jae-In Hwang
Ahyoung Choi
author_sort Ghazanfar Ali
collection DOAJ
description In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction.
format Article
id doaj-art-9b7faf194345448caf5cbce71bf6ffd2
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-9b7faf194345448caf5cbce71bf6ffd22025-08-25T23:11:57ZengIEEEIEEE Access2169-35362025-01-011314514414515710.1109/ACCESS.2025.359632811115060Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital HumansGhazanfar Ali0https://orcid.org/0000-0002-7741-1938Woojoo Kim1https://orcid.org/0000-0001-6203-7309Muhammad Shahid Anwar2https://orcid.org/0000-0001-8093-6690Jae-In Hwang3Ahyoung Choi4https://orcid.org/0000-0001-7676-9869Intelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDivision of Liberal Studies, Kangwon National University, Chuncheon-si, Gangwon-do, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIntelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIn this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction.https://ieeexplore.ieee.org/document/11115060/Co-speech gesturesgesture generationHCImachine learningaugmented/virtual/mixed realities
spellingShingle Ghazanfar Ali
Woojoo Kim
Muhammad Shahid Anwar
Jae-In Hwang
Ahyoung Choi
Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
IEEE Access
Co-speech gestures
gesture generation
HCI
machine learning
augmented/virtual/mixed realities
title Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_full Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_fullStr Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_full_unstemmed Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_short Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_sort expanding multilingual co speech interaction the impact of enhanced gesture units in text to gesture synthesis for digital humans
topic Co-speech gestures
gesture generation
HCI
machine learning
augmented/virtual/mixed realities
url https://ieeexplore.ieee.org/document/11115060/
work_keys_str_mv AT ghazanfarali expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans
AT woojookim expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans
AT muhammadshahidanwar expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans
AT jaeinhwang expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans
AT ahyoungchoi expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans