DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global s...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Animals |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-2615/15/15/2228 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849770949722767360 |
|---|---|
| author | Zhengyang Lu Huan Li Min Liu Yibin Lin Yao Qin Xuanyu Wu Nanbo Xu Haibo Pu |
| author_facet | Zhengyang Lu Huan Li Min Liu Yibin Lin Yao Qin Xuanyu Wu Nanbo Xu Haibo Pu |
| author_sort | Zhengyang Lu |
| collection | DOAJ |
| description | This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global spectro–temporal cues. Unlike single-feature approaches, DuSAFNet captures both local spectral textures and long-range temporal dependencies in Mel-spectrogram inputs and explicitly enhances inter-class separability across low, mid, and high frequency bands. On a curated dataset of 17,653 three-second recordings spanning 18 species, DuSAFNet achieves 96.88% accuracy and a 96.83% F1 score using only 6.77 M parameters and 2.275 GFLOPs. Cross-dataset evaluation on Birdsdata yields 93.74% accuracy, demonstrating robust generalization to new recording conditions. Its lightweight design and high performance make DuSAFNet well-suited for edge-device deployment and real-time alerts for rare or threatened species. This work lays the foundation for scalable, automated acoustic monitoring to inform biodiversity assessments and conservation planning. |
| format | Article |
| id | doaj-art-4248c40a74ec4a6fb04d92f03f765c2a |
| institution | DOAJ |
| issn | 2076-2615 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Animals |
| spelling | doaj-art-4248c40a74ec4a6fb04d92f03f765c2a2025-08-20T03:02:48ZengMDPI AGAnimals2076-26152025-07-011515222810.3390/ani15152228DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio ClassificationZhengyang Lu0Huan Li1Min Liu2Yibin Lin3Yao Qin4Xuanyu Wu5Nanbo Xu6Haibo Pu7College of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Fisheries, Jimei University, Xiamen 361021, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaThis research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global spectro–temporal cues. Unlike single-feature approaches, DuSAFNet captures both local spectral textures and long-range temporal dependencies in Mel-spectrogram inputs and explicitly enhances inter-class separability across low, mid, and high frequency bands. On a curated dataset of 17,653 three-second recordings spanning 18 species, DuSAFNet achieves 96.88% accuracy and a 96.83% F1 score using only 6.77 M parameters and 2.275 GFLOPs. Cross-dataset evaluation on Birdsdata yields 93.74% accuracy, demonstrating robust generalization to new recording conditions. Its lightweight design and high performance make DuSAFNet well-suited for edge-device deployment and real-time alerts for rare or threatened species. This work lays the foundation for scalable, automated acoustic monitoring to inform biodiversity assessments and conservation planning.https://www.mdpi.com/2076-2615/15/15/2228bird audio classificationspectral–temporal attentionmulti-path feature fusionArcMarginProductpassive acoustic monitoringreal-time conservation |
| spellingShingle | Zhengyang Lu Huan Li Min Liu Yibin Lin Yao Qin Xuanyu Wu Nanbo Xu Haibo Pu DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification Animals bird audio classification spectral–temporal attention multi-path feature fusion ArcMarginProduct passive acoustic monitoring real-time conservation |
| title | DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification |
| title_full | DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification |
| title_fullStr | DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification |
| title_full_unstemmed | DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification |
| title_short | DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification |
| title_sort | dusafnet a multi path feature fusion and spectral temporal attention based model for bird audio classification |
| topic | bird audio classification spectral–temporal attention multi-path feature fusion ArcMarginProduct passive acoustic monitoring real-time conservation |
| url | https://www.mdpi.com/2076-2615/15/15/2228 |
| work_keys_str_mv | AT zhengyanglu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT huanli dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT minliu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT yibinlin dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT yaoqin dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT xuanyuwu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT nanboxu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification AT haibopu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification |