Context-aware data augmentation for enhanced speech command recognition in industrial environments

Abstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world in...

Full description

Saved in:
Bibliographic Details
Main Authors: Giuseppe De Simone, Antonio Greco, Francesco Rosa, Alessia Saggese, Mario Vento
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-01886-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849325708072976384
author Giuseppe De Simone
Antonio Greco
Francesco Rosa
Alessia Saggese
Mario Vento
author_facet Giuseppe De Simone
Antonio Greco
Francesco Rosa
Alessia Saggese
Mario Vento
author_sort Giuseppe De Simone
collection DOAJ
description Abstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications.
format Article
id doaj-art-52602797dbac482bb7f68c881a151e51
institution Kabale University
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-52602797dbac482bb7f68c881a151e512025-08-20T03:48:19ZengNature PortfolioScientific Reports2045-23222025-05-0115111610.1038/s41598-025-01886-3Context-aware data augmentation for enhanced speech command recognition in industrial environmentsGiuseppe De Simone0Antonio Greco1Francesco Rosa2Alessia Saggese3Mario Vento4University of SalernoUniversity of SalernoUniversity of SalernoUniversity of SalernoUniversity of SalernoAbstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications.https://doi.org/10.1038/s41598-025-01886-3
spellingShingle Giuseppe De Simone
Antonio Greco
Francesco Rosa
Alessia Saggese
Mario Vento
Context-aware data augmentation for enhanced speech command recognition in industrial environments
Scientific Reports
title Context-aware data augmentation for enhanced speech command recognition in industrial environments
title_full Context-aware data augmentation for enhanced speech command recognition in industrial environments
title_fullStr Context-aware data augmentation for enhanced speech command recognition in industrial environments
title_full_unstemmed Context-aware data augmentation for enhanced speech command recognition in industrial environments
title_short Context-aware data augmentation for enhanced speech command recognition in industrial environments
title_sort context aware data augmentation for enhanced speech command recognition in industrial environments
url https://doi.org/10.1038/s41598-025-01886-3
work_keys_str_mv AT giuseppedesimone contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments
AT antoniogreco contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments
AT francescorosa contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments
AT alessiasaggese contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments
AT mariovento contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments