DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification

This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global s...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhengyang Lu, Huan Li, Min Liu, Yibin Lin, Yao Qin, Xuanyu Wu, Nanbo Xu, Haibo Pu
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Animals
Subjects:
Online Access:https://www.mdpi.com/2076-2615/15/15/2228
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849770949722767360
author Zhengyang Lu
Huan Li
Min Liu
Yibin Lin
Yao Qin
Xuanyu Wu
Nanbo Xu
Haibo Pu
author_facet Zhengyang Lu
Huan Li
Min Liu
Yibin Lin
Yao Qin
Xuanyu Wu
Nanbo Xu
Haibo Pu
author_sort Zhengyang Lu
collection DOAJ
description This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global spectro–temporal cues. Unlike single-feature approaches, DuSAFNet captures both local spectral textures and long-range temporal dependencies in Mel-spectrogram inputs and explicitly enhances inter-class separability across low, mid, and high frequency bands. On a curated dataset of 17,653 three-second recordings spanning 18 species, DuSAFNet achieves 96.88% accuracy and a 96.83% F1 score using only 6.77 M parameters and 2.275 GFLOPs. Cross-dataset evaluation on Birdsdata yields 93.74% accuracy, demonstrating robust generalization to new recording conditions. Its lightweight design and high performance make DuSAFNet well-suited for edge-device deployment and real-time alerts for rare or threatened species. This work lays the foundation for scalable, automated acoustic monitoring to inform biodiversity assessments and conservation planning.
format Article
id doaj-art-4248c40a74ec4a6fb04d92f03f765c2a
institution DOAJ
issn 2076-2615
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Animals
spelling doaj-art-4248c40a74ec4a6fb04d92f03f765c2a2025-08-20T03:02:48ZengMDPI AGAnimals2076-26152025-07-011515222810.3390/ani15152228DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio ClassificationZhengyang Lu0Huan Li1Min Liu2Yibin Lin3Yao Qin4Xuanyu Wu5Nanbo Xu6Haibo Pu7College of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Fisheries, Jimei University, Xiamen 361021, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaCollege of Information Engineering, Sichuan Agriculture University, Ya’an 625014, ChinaThis research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global spectro–temporal cues. Unlike single-feature approaches, DuSAFNet captures both local spectral textures and long-range temporal dependencies in Mel-spectrogram inputs and explicitly enhances inter-class separability across low, mid, and high frequency bands. On a curated dataset of 17,653 three-second recordings spanning 18 species, DuSAFNet achieves 96.88% accuracy and a 96.83% F1 score using only 6.77 M parameters and 2.275 GFLOPs. Cross-dataset evaluation on Birdsdata yields 93.74% accuracy, demonstrating robust generalization to new recording conditions. Its lightweight design and high performance make DuSAFNet well-suited for edge-device deployment and real-time alerts for rare or threatened species. This work lays the foundation for scalable, automated acoustic monitoring to inform biodiversity assessments and conservation planning.https://www.mdpi.com/2076-2615/15/15/2228bird audio classificationspectral–temporal attentionmulti-path feature fusionArcMarginProductpassive acoustic monitoringreal-time conservation
spellingShingle Zhengyang Lu
Huan Li
Min Liu
Yibin Lin
Yao Qin
Xuanyu Wu
Nanbo Xu
Haibo Pu
DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
Animals
bird audio classification
spectral–temporal attention
multi-path feature fusion
ArcMarginProduct
passive acoustic monitoring
real-time conservation
title DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
title_full DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
title_fullStr DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
title_full_unstemmed DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
title_short DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification
title_sort dusafnet a multi path feature fusion and spectral temporal attention based model for bird audio classification
topic bird audio classification
spectral–temporal attention
multi-path feature fusion
ArcMarginProduct
passive acoustic monitoring
real-time conservation
url https://www.mdpi.com/2076-2615/15/15/2228
work_keys_str_mv AT zhengyanglu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT huanli dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT minliu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT yibinlin dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT yaoqin dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT xuanyuwu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT nanboxu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification
AT haibopu dusafnetamultipathfeaturefusionandspectraltemporalattentionbasedmodelforbirdaudioclassification