DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System

Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated...

Full description

Saved in:
Bibliographic Details
Main Authors: Zong-En Wu, Shao-Jung Chan, Yeshanew Ale Wubet, Kuang-Yow Lian
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858118/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823859593647751168
author Zong-En Wu
Shao-Jung Chan
Yeshanew Ale Wubet
Kuang-Yow Lian
author_facet Zong-En Wu
Shao-Jung Chan
Yeshanew Ale Wubet
Kuang-Yow Lian
author_sort Zong-En Wu
collection DOAJ
description Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments.
format Article
id doaj-art-19207b5a2de7499e92091379ed0edba2
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-19207b5a2de7499e92091379ed0edba22025-02-11T00:00:47ZengIEEEIEEE Access2169-35362025-01-0113234982350710.1109/ACCESS.2025.353647010858118DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting SystemZong-En Wu0Shao-Jung Chan1https://orcid.org/0009-0000-9203-9305Yeshanew Ale Wubet2https://orcid.org/0000-0002-1411-715XKuang-Yow Lian3https://orcid.org/0000-0002-5692-9279Department of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDepartment of Electrical Engineering, National Taipei University of Technology, Taipei, TaiwanDeployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments.https://ieeexplore.ieee.org/document/10858118/Dilated-LiGRUdeep learningGRUkeyword spottingX-vector
spellingShingle Zong-En Wu
Shao-Jung Chan
Yeshanew Ale Wubet
Kuang-Yow Lian
DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
IEEE Access
Dilated-LiGRU
deep learning
GRU
keyword spotting
X-vector
title DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_full DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_fullStr DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_full_unstemmed DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_short DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
title_sort dligru x efficient x vector based embeddings for small footprint keyword spotting system
topic Dilated-LiGRU
deep learning
GRU
keyword spotting
X-vector
url https://ieeexplore.ieee.org/document/10858118/
work_keys_str_mv AT zongenwu dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem
AT shaojungchan dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem
AT yeshanewalewubet dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem
AT kuangyowlian dligruxefficientxvectorbasedembeddingsforsmallfootprintkeywordspottingsystem