Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency
Spoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixi...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/10/3085 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849719801744719872 |
|---|---|
| author | Boyuan Zhang Xibang Yang Tong Xie Shuyuan Zhu Bing Zeng |
| author_facet | Boyuan Zhang Xibang Yang Tong Xie Shuyuan Zhu Bing Zeng |
| author_sort | Boyuan Zhang |
| collection | DOAJ |
| description | Spoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixing and aliasing, degrading feature extraction. We propose filamentary convolution to replace rectangular kernels, reducing the parameters while preserving inter-frame features by focusing solely on frequency patterns. Visualization confirms its enhanced sensitivity to critical frequency variations (e.g., intonation, rhythm) for language recognition. Evaluated via self-built datasets and cross-validated with public corpora, filamentary convolution improves the low-level feature extraction efficiency and synergizes with temporal models (LSTM/TDNN) to boost recognition. This method addresses aliasing limitations while maintaining computational efficiency in SLI systems. |
| format | Article |
| id | doaj-art-fd88d296246542b887dc1e95d3ee24b7 |
| institution | DOAJ |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-fd88d296246542b887dc1e95d3ee24b72025-08-20T03:12:04ZengMDPI AGSensors1424-82202025-05-012510308510.3390/s25103085Filamentary Convolution for SLI: A Brain-Inspired Approach with High EfficiencyBoyuan Zhang0Xibang Yang1Tong Xie2Shuyuan Zhu3Bing Zeng4School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaNanyang Technological University, 50 Nanyang Avenue Avenue, Singapore 639798, SingaporeSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSchool of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, ChinaSpoken language identification (SLI) relies on detecting key frequency characteristics like pitch, tone, and rhythm. While the short-time Fourier transform (STFT) generates time–frequency acoustic features (TFAF) for deep learning networks (DLNs), rectangular convolution kernels cause frequency mixing and aliasing, degrading feature extraction. We propose filamentary convolution to replace rectangular kernels, reducing the parameters while preserving inter-frame features by focusing solely on frequency patterns. Visualization confirms its enhanced sensitivity to critical frequency variations (e.g., intonation, rhythm) for language recognition. Evaluated via self-built datasets and cross-validated with public corpora, filamentary convolution improves the low-level feature extraction efficiency and synergizes with temporal models (LSTM/TDNN) to boost recognition. This method addresses aliasing limitations while maintaining computational efficiency in SLI systems.https://www.mdpi.com/1424-8220/25/10/3085spoken language identificationdeep learning network (DLN)filamentary convolutionfrequency-level feature extraction |
| spellingShingle | Boyuan Zhang Xibang Yang Tong Xie Shuyuan Zhu Bing Zeng Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency Sensors spoken language identification deep learning network (DLN) filamentary convolution frequency-level feature extraction |
| title | Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency |
| title_full | Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency |
| title_fullStr | Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency |
| title_full_unstemmed | Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency |
| title_short | Filamentary Convolution for SLI: A Brain-Inspired Approach with High Efficiency |
| title_sort | filamentary convolution for sli a brain inspired approach with high efficiency |
| topic | spoken language identification deep learning network (DLN) filamentary convolution frequency-level feature extraction |
| url | https://www.mdpi.com/1424-8220/25/10/3085 |
| work_keys_str_mv | AT boyuanzhang filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency AT xibangyang filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency AT tongxie filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency AT shuyuanzhu filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency AT bingzeng filamentaryconvolutionforsliabraininspiredapproachwithhighefficiency |