An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objec...

Full description

Saved in:
Bibliographic Details
Main Authors: Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva, Massimo Villari
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/7/188
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714475497684992
author Alessio Catalfamo
Antonio Celesti
Maria Fazio
A. F. M. Saifuddin Saif
Yu-Sheng Lin
Edelberto Franco Silva
Massimo Villari
author_facet Alessio Catalfamo
Antonio Celesti
Maria Fazio
A. F. M. Saifuddin Saif
Yu-Sheng Lin
Edelberto Franco Silva
Massimo Villari
author_sort Alessio Catalfamo
collection DOAJ
description Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice.
format Article
id doaj-art-1e9bc1e8dfcf42d2b88b911bd4ab0493
institution DOAJ
issn 2504-2289
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj-art-1e9bc1e8dfcf42d2b88b911bd4ab04932025-08-20T03:13:42ZengMDPI AGBig Data and Cognitive Computing2504-22892025-07-019718810.3390/bdcc9070188An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual EnvironmentAlessio Catalfamo0Antonio Celesti1Maria Fazio2A. F. M. Saifuddin Saif3Yu-Sheng Lin4Edelberto Franco Silva5Massimo Villari6MIFT Department, University of Messina, 98166 Messina, ItalyMIFT Department, University of Messina, 98166 Messina, ItalyMIFT Department, University of Messina, 98166 Messina, ItalyDepartment of Computing, Information and Mathematical Sciences, and Technology (CIMST), Chicago State University, Chicago, IL 60628, USADepartment of Mechanical Engineering, Southern Taiwan University of Science and Technology, Tainan 71005, TaiwanDepartment of Computer Science, Federal University of Juiz de Fora (UFJF), Juiz de Fora 36036-330, MG, BrazilMIFT Department, University of Messina, 98166 Messina, ItalyNowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice.https://www.mdpi.com/2504-2289/9/7/188virtual realityautomated speech recognitionconvolutional neural networksONNX
spellingShingle Alessio Catalfamo
Antonio Celesti
Maria Fazio
A. F. M. Saifuddin Saif
Yu-Sheng Lin
Edelberto Franco Silva
Massimo Villari
An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
Big Data and Cognitive Computing
virtual reality
automated speech recognition
convolutional neural networks
ONNX
title An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
title_full An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
title_fullStr An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
title_full_unstemmed An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
title_short An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
title_sort approach to enable human 3d object interaction through voice commands in an immersive virtual environment
topic virtual reality
automated speech recognition
convolutional neural networks
ONNX
url https://www.mdpi.com/2504-2289/9/7/188
work_keys_str_mv AT alessiocatalfamo anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT antoniocelesti anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT mariafazio anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT afmsaifuddinsaif anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT yushenglin anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT edelbertofrancosilva anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT massimovillari anapproachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT alessiocatalfamo approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT antoniocelesti approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT mariafazio approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT afmsaifuddinsaif approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT yushenglin approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT edelbertofrancosilva approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment
AT massimovillari approachtoenablehuman3dobjectinteractionthroughvoicecommandsinanimmersivevirtualenvironment