Context-aware data augmentation for enhanced speech command recognition in industrial environments
Abstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world in...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-01886-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849325708072976384 |
|---|---|
| author | Giuseppe De Simone Antonio Greco Francesco Rosa Alessia Saggese Mario Vento |
| author_facet | Giuseppe De Simone Antonio Greco Francesco Rosa Alessia Saggese Mario Vento |
| author_sort | Giuseppe De Simone |
| collection | DOAJ |
| description | Abstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications. |
| format | Article |
| id | doaj-art-52602797dbac482bb7f68c881a151e51 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-52602797dbac482bb7f68c881a151e512025-08-20T03:48:19ZengNature PortfolioScientific Reports2045-23222025-05-0115111610.1038/s41598-025-01886-3Context-aware data augmentation for enhanced speech command recognition in industrial environmentsGiuseppe De Simone0Antonio Greco1Francesco Rosa2Alessia Saggese3Mario Vento4University of SalernoUniversity of SalernoUniversity of SalernoUniversity of SalernoUniversity of SalernoAbstract In Human-Robot Interaction, speech is one of the most intuitive and effective communication channel. In Industry 4.0, speech-based communication can significantly enhance productivity and efficiency on production lines. However, deploying a Speech Command Recognition Module in real-world industrial settings poses challenges, as the system must balance two conflicting objectives: accurately recognizing commands while rejecting noise and irrelevant speech. To address this, we propose a modular framework designed to optimize recognition accuracy and rejection robustness while minimizing the need for extensive industrial dataset collection. The framework features an efficient Command Recognition module trained on laboratory-collected data augmented with synthetic samples. Advanced context-aware data augmentation techniques and dynamic noise injection further enhance the model’s robustness. To improve reliability in noisy environments, a Keyword Spotting module is introduced, activating the recognition system only when a predefined keyword is detected. The proposed system was evaluated using real-world samples collected in a noisy industrial setting. The results demonstrated a high recall rate for both command recognition and noise rejection, confirming the system’s effectiveness in meeting the demands of industrial applications.https://doi.org/10.1038/s41598-025-01886-3 |
| spellingShingle | Giuseppe De Simone Antonio Greco Francesco Rosa Alessia Saggese Mario Vento Context-aware data augmentation for enhanced speech command recognition in industrial environments Scientific Reports |
| title | Context-aware data augmentation for enhanced speech command recognition in industrial environments |
| title_full | Context-aware data augmentation for enhanced speech command recognition in industrial environments |
| title_fullStr | Context-aware data augmentation for enhanced speech command recognition in industrial environments |
| title_full_unstemmed | Context-aware data augmentation for enhanced speech command recognition in industrial environments |
| title_short | Context-aware data augmentation for enhanced speech command recognition in industrial environments |
| title_sort | context aware data augmentation for enhanced speech command recognition in industrial environments |
| url | https://doi.org/10.1038/s41598-025-01886-3 |
| work_keys_str_mv | AT giuseppedesimone contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments AT antoniogreco contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments AT francescorosa contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments AT alessiasaggese contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments AT mariovento contextawaredataaugmentationforenhancedspeechcommandrecognitioninindustrialenvironments |