DLiGRU-X: Efficient X-Vector-Based Embeddings for Small-Footprint Keyword Spotting System
Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10858118/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Deployment of deep learning-based speech processing models for real-world applications on devices with limited processing capacity and memory constraints poses significant challenges. This paper introduces an enhanced deep learning model based on the X-vector architecture, named dilated light-gated recurrent unit in X-vector (DLiGRU-X) to address these challenges specifically for small-footprint keyword spotting (KWS) tasks. The DLiGRU-X model enhances temporal feature extraction with a LiGRU and reduces computational complexity through dilated convolution techniques. The proposed model efficiently learns speech signal characteristics, making it suitable for scenarios with limited hardware resources and can handle an expanded vocabulary size of keyword identification. The proposed model is validated on the Google Speech Command public dataset, and its performance is compared with other recently proposed deep learning models for KWS. The proposed model achieves an excellent trade-off between recognition accuracy and computational complexity, outperforming various advanced keyword spotting models. Notably, despite a reduction in model parameters, DLiGRU-X maintains an accuracy of 97% without significant decline. This model offers greater flexibility compared to previous models, allowing users to adjust and expand the set of targeted vocabulary according to their needs and deploy the model in resource-constrained environments. |
---|---|
ISSN: | 2169-3536 |