Integrating Stride Attention and Cross-Modality Fusion for UAV-Based Detection of Drought, Pest, and Disease Stress in Croplands

Timely and accurate detection of agricultural disasters is crucial for ensuring food security and enhancing post-disaster response efficiency. This paper proposes a deployable UAV-based multimodal agricultural disaster detection framework that integrates multispectral and RGB imagery to simultaneous...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Li, Yaze Wu, Wuxiong Wang, Huiyu Jin, Xiaohan Wu, Jinyuan Liu, Chen Hu, Chunli Lv
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Agronomy
Subjects:
Online Access:https://www.mdpi.com/2073-4395/15/5/1199
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Timely and accurate detection of agricultural disasters is crucial for ensuring food security and enhancing post-disaster response efficiency. This paper proposes a deployable UAV-based multimodal agricultural disaster detection framework that integrates multispectral and RGB imagery to simultaneously capture the spectral responses and spatial structural features of affected crop regions. To this end, we design an innovative stride–cross-attention mechanism, in which stride attention is utilized for efficient spatial feature extraction, while cross-attention facilitates semantic fusion between heterogeneous modalities. The experimental data were collected from representative wheat and maize fields in Inner Mongolia, using UAVs equipped with synchronized multispectral (red, green, blue, red edge, near-infrared) and high-resolution RGB sensors. Through a combination of image preprocessing, geometric correction, and various augmentation strategies (e.g., MixUp, CutMix, GridMask, RandAugment), the quality and diversity of the training samples were significantly enhanced. The model trained on the constructed dataset achieved an accuracy of 93.2%, an F1 score of 92.7%, a precision of 93.5%, and a recall of 92.4%, substantially outperforming mainstream models such as ResNet50, EfficientNet-B0, and ViT across multiple evaluation metrics. Ablation studies further validated the critical role of the stride attention and cross-attention modules in performance improvement. This study demonstrates that the integration of lightweight attention mechanisms with multimodal UAV remote sensing imagery enables efficient, accurate, and scalable agricultural disaster detection under complex field conditions.
ISSN:2073-4395