Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans
In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness,...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11115060/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849223100647866368 |
|---|---|
| author | Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi |
| author_facet | Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi |
| author_sort | Ghazanfar Ali |
| collection | DOAJ |
| description | In this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction. |
| format | Article |
| id | doaj-art-9b7faf194345448caf5cbce71bf6ffd2 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-9b7faf194345448caf5cbce71bf6ffd22025-08-25T23:11:57ZengIEEEIEEE Access2169-35362025-01-011314514414515710.1109/ACCESS.2025.359632811115060Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital HumansGhazanfar Ali0https://orcid.org/0000-0002-7741-1938Woojoo Kim1https://orcid.org/0000-0001-6203-7309Muhammad Shahid Anwar2https://orcid.org/0000-0001-8093-6690Jae-In Hwang3Ahyoung Choi4https://orcid.org/0000-0001-7676-9869Intelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDivision of Liberal Studies, Kangwon National University, Chuncheon-si, Gangwon-do, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIntelligence and Interaction Research Center (I2RC), Korea Institute of Science and Technology (KIST), Seongbuk-gu, Seoul, Republic of KoreaDepartment of AI and Software, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of KoreaIn this study, we explore the effects of co-speech gesture generation on user experience in 3D digital human interaction by testing two key hypotheses. The first hypothesis posits that increasing the number of gestures enhances the user experience across criteria such as naturalness, human-likeness, temporal consistency, semantic consistency, and social presence. The second hypothesis suggests that language translation does not degrade the user experience across these criteria. To explore these hypotheses, we investigated three conditions using a digital human: voice only with no gestures, limited(56 gestures) co-speech gestures, and full system functionality with over 2000 unique gestures. For the second hypothesis, we used language translation to provide multilingual support, retrieving gestures from an English rule base. We obtained text and pose from English videos and matched the pose with gesture units derived from Korean speakers’ motion-capture sequences, enhancing a comprehensive rule base that we used for gesture retrieval for given text input. We used translation of non-English input language to English for text matching. Our novel method utilizes an improved pipeline to extract text, 2D pose data, and 3D gesture units. Incorporating a cutting-edge gesture-pose matching model with deep contrastive learning, we retrieved gestures from a comprehensive rule base containing 210,000 rules. This approach optimizes alignment and generates realistic, semantically consistent co-speech gestures adaptable to various languages. A comprehensive user study evaluated our hypotheses. The results underscored the positive impact of diverse gestures, supporting the first hypothesis. Additionally, multilingual capabilities did not degrade the user experience, confirming the second hypothesis. Highlighting the scalability and flexibility of our method, this study provides valuable insights into cross-lingual data and expert systems for gesture generation, contributing significantly to more engaging and immersive digital human interactions and the broader field of human-computer interaction.https://ieeexplore.ieee.org/document/11115060/Co-speech gesturesgesture generationHCImachine learningaugmented/virtual/mixed realities |
| spellingShingle | Ghazanfar Ali Woojoo Kim Muhammad Shahid Anwar Jae-In Hwang Ahyoung Choi Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans IEEE Access Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities |
| title | Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans |
| title_full | Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans |
| title_fullStr | Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans |
| title_full_unstemmed | Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans |
| title_short | Expanding Multilingual Co-Speech Interaction: The Impact of Enhanced Gesture Units in Text-to-Gesture Synthesis for Digital Humans |
| title_sort | expanding multilingual co speech interaction the impact of enhanced gesture units in text to gesture synthesis for digital humans |
| topic | Co-speech gestures gesture generation HCI machine learning augmented/virtual/mixed realities |
| url | https://ieeexplore.ieee.org/document/11115060/ |
| work_keys_str_mv | AT ghazanfarali expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT woojookim expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT muhammadshahidanwar expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT jaeinhwang expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans AT ahyoungchoi expandingmultilingualcospeechinteractiontheimpactofenhancedgestureunitsintexttogesturesynthesisfordigitalhumans |