Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans

In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ghazanfar Ali, Woojoo Kim, Muhammad Shahid Anwar, Jae-In Hwang, Ahyoung Choi
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities
Online Access:	https://ieeexplore.ieee.org/document/11115060/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849223100647866368
author	Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi
author_facet	Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi
author_sort	Ghazanfar Ali
collection	DOAJ
description	In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction.
format	Article
id	doaj-art-9b7faf194345448caf5cbce71bf6ffd2
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-9b7faf194345448caf5cbce71bf6ffd22025-08-25T23:11:57ZengIEEEIEEE Access2169-35362025-01-011314514414515710.1109/ACCESS.2025.359632811115060Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital HumansGhazanfar Ali0https://orcid.org/0000-0002-7741-1938Woojoo Kim1https://orcid.org/0000-0001-6203-7309Muhammad Shahid Anwar2https://orcid.org/0000-0001-8093-6690Jae-In Hwang3Ahyoung Choi4https://orcid.org/0000-0001-7676-9869Intelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDivision of Liberal Studies, Kangwon National University, Chuncheon-si, Gangwon-do, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIntelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIn this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction.https://ieeexplore.ieee.org/document/11115060/Co-speech gesturesgesture generationHCImachine learningaugmented/virtual/mixed realities
spellingShingle	Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans IEEE Access Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities
title	Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_full	Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_fullStr	Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_full_unstemmed	Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_short	Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
title_sort	expanding multilingual co speech interaction the impact of enhanced gesture units in text to gesture synthesis for digital humans
topic	Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities
url	https://ieeexplore.ieee.org/document/11115060/
work_keys_str_mv	AT ghazanfarali expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT woojookim expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT muhammadshahidanwar expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT jaeinhwang expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT ahyoungchoi expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans

Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans

Similar Items