An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objec...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/7/188 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714475497684992 |
|---|---|
| author | Alessio Catalfamo Antonio Celesti Maria Fazio A. F. M. Saifuddin Saif Yu-Sheng Lin Edelberto Franco Silva Massimo Villari |
| author_facet | Alessio Catalfamo Antonio Celesti Maria Fazio A. F. M. Saifuddin Saif Yu-Sheng Lin Edelberto Franco Silva Massimo Villari |
| author_sort | Alessio Catalfamo |
| collection | DOAJ |
| description | Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice. |
| format | Article |
| id | doaj-art-1e9bc1e8dfcf42d2b88b911bd4ab0493 |
| institution | DOAJ |
| issn | 2504-2289 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Big Data and Cognitive Computing |
| spelling | doaj-art-1e9bc1e8dfcf42d2b88b911bd4ab04932025-08-20T03:13:42ZengMDPI AGBig Data and Cognitive Computing2504-22892025-07-019718810.3390/bdcc9070188An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual EnvironmentAlessio Catalfamo0Antonio Celesti1Maria Fazio2A. F. M. Saifuddin Saif3Yu-Sheng Lin4Edelberto Franco Silva5Massimo Villari6MIFT Department, University of Messina, 98166 Messina, ItalyMIFT Department, University of Messina, 98166 Messina, ItalyMIFT Department, University of Messina, 98166 Messina, ItalyDepartment of Computing, Information and Mathematical Sciences, and Technology (CIMST), Chicago State University, Chicago, IL 60628, USADepartment of Mechanical Engineering, Southern Taiwan University of Science and Technology, Tainan 71005, TaiwanDepartment of Computer Science, Federal University of Juiz de Fora (UFJF), Juiz de Fora 36036-330, MG, BrazilMIFT Department, University of Messina, 98166 Messina, ItalyNowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice.https://www.mdpi.com/2504-2289/9/7/188virtual realityautomated speech recognitionconvolutional neural networksONNX |
| spellingShingle | Alessio Catalfamo Antonio Celesti Maria Fazio A. F. M. Saifuddin Saif Yu-Sheng Lin Edelberto Franco Silva Massimo Villari An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment Big Data and Cognitive Computing virtual reality automated speech recognition convolutional neural networks ONNX |
| title | An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment |
| title_full | An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment |
| title_fullStr | An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment |
| title_full_unstemmed | An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment |
| title_short | An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment |
| title_sort | approach to enable human 3d object interaction through voice commands in an immersive virtual environment |
| topic | virtual reality automated speech recognition convolutional neural networks ONNX |
| url | https://www.mdpi.com/2504-2289/9/7/188 |
| work_keys_str_mv | AT alessiocatalfamo anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT antoniocelesti anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT mariafazio anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT afmsaifuddinsaif anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT yushenglin anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT edelbertofrancosilva anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT massimovillari anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT alessiocatalfamo approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT antoniocelesti approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT mariafazio approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT afmsaifuddinsaif approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT yushenglin approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT edelbertofrancosilva approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment AT massimovillari approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment |