DuSAFNet: A Multi-Path Feature Fusion and Spectral–Temporal Attention-Based Model for Bird Audio Classification

This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global s...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhengyang Lu, Huan Li, Min Liu, Yibin Lin, Yao Qin, Xuanyu Wu, Nanbo Xu, Haibo Pu
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Animals
Subjects:
Online Access:https://www.mdpi.com/2076-2615/15/15/2228
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research presents DuSAFNet, a lightweight deep neural network for fine-grained bird audio classification. DuSAFNet combines dual-path feature fusion, spectral–temporal attention, and a multi-band ArcMarginProduct classifier to enhance inter-class separability and capture both local and global spectro–temporal cues. Unlike single-feature approaches, DuSAFNet captures both local spectral textures and long-range temporal dependencies in Mel-spectrogram inputs and explicitly enhances inter-class separability across low, mid, and high frequency bands. On a curated dataset of 17,653 three-second recordings spanning 18 species, DuSAFNet achieves 96.88% accuracy and a 96.83% F1 score using only 6.77 M parameters and 2.275 GFLOPs. Cross-dataset evaluation on Birdsdata yields 93.74% accuracy, demonstrating robust generalization to new recording conditions. Its lightweight design and high performance make DuSAFNet well-suited for edge-device deployment and real-time alerts for rare or threatened species. This work lays the foundation for scalable, automated acoustic monitoring to inform biodiversity assessments and conservation planning.
ISSN:2076-2615