Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency

Spoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixi...

Full description

Saved in:
Bibliographic Details
Main Authors: Boyuan Zhang, Xibang Yang, Tong Xie, Shuyuan Zhu, Bing Zeng
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/10/3085
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849719801744719872
author Boyuan Zhang
Xibang Yang
Tong Xie
Shuyuan Zhu
Bing Zeng
author_facet Boyuan Zhang
Xibang Yang
Tong Xie
Shuyuan Zhu
Bing Zeng
author_sort Boyuan Zhang
collection DOAJ
description Spoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixing and aliasing, degrading feature extraction. We propose filamentary convolution to replace rectangular kernels, reducing the parameters while preserving inter-frame features by focusing solely on frequency patterns. Visualization confirms its enhanced sensitivity to critical frequency variations (e.g., intonation, rhythm) for language recognition. Evaluated via self-built datasets and cross-validated with public corpora, filamentary convolution improves the low-level feature extraction efficiency and synergizes with temporal models (LSTM/TDNN) to boost recognition. This method addresses aliasing limitations while maintaining computational efficiency in SLI systems.
format Article
id doaj-art-fd88d296246542b887dc1e95d3ee24b7
institution DOAJ
issn 1424-8220
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-fd88d296246542b887dc1e95d3ee24b72025-08-20T03:12:04ZengMDPI AGSensors1424-82202025-05-012510308510.3390/s25103085Filamentary Convolution for SLI: A Brain-Inspired Approach with High EfficiencyBoyuan Zhang0Xibang Yang1Tong Xie2Shuyuan Zhu3Bing Zeng4School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaNanyang Technological University, 50 Nanyang Avenue Avenue, Singapore 639798, SingaporeSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSpoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixing and aliasing, degrading feature extraction. We propose filamentary convolution to replace rectangular kernels, reducing the parameters while preserving inter-frame features by focusing solely on frequency patterns. Visualization confirms its enhanced sensitivity to critical frequency variations (e.g., intonation, rhythm) for language recognition. Evaluated via self-built datasets and cross-validated with public corpora, filamentary convolution improves the low-level feature extraction efficiency and synergizes with temporal models (LSTM/TDNN) to boost recognition. This method addresses aliasing limitations while maintaining computational efficiency in SLI systems.https://www.mdpi.com/1424-8220/25/10/3085spoken language identificationdeep learning network (DLN)filamentary convolutionfrequency-level feature extraction
spellingShingle Boyuan Zhang
Xibang Yang
Tong Xie
Shuyuan Zhu
Bing Zeng
Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
Sensors
spoken language identification
deep learning network (DLN)
filamentary convolution
frequency-level feature extraction
title Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
title_full Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
title_fullStr Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
title_full_unstemmed Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
title_short Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
title_sort filamentary convolution for sli a brain inspired approach with high efficiency
topic spoken language identification
deep learning network (DLN)
filamentary convolution
frequency-level feature extraction
url https://www.mdpi.com/1424-8220/25/10/3085
work_keys_str_mv AT boyuanzhang filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency
AT xibangyang filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency
AT tongxie filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency
AT shuyuanzhu filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency
AT bingzeng filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency