Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification

In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Shihao Wang, Zhengwei Xu, Yun Lin
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071999/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849706363776663552
author Shihao Wang
Zhengwei Xu
Yun Lin
author_facet Shihao Wang
Zhengwei Xu
Yun Lin
author_sort Shihao Wang
collection DOAJ
description In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiveness of fusion is severely affected by modality imbalance caused by inconsistent gradient directions when integrating heterogeneous information. Existing methods predominantly focus on parameter tuning and gradient modulation, failing to resolve inherent conflicts from divergent modality optimization trajectories. To address these limitations, we propose a gradient-criterion multistage training (GCMT) framework, which systematically resolves gradient conflicts through an alternating freezing strategy, optimizing unimodal branches by evaluating consistency between unimodal and multimodal gradient directions. Building on the GCMT, we further introduce an information entropy measurement fusion (IEMF) module, which dynamically adjusts cross-modal feature fusion weights using entropy-based metrics to mitigate overreliance on dominant modalities while preserving synergistic interactions. We build a multimodal dataset of signals and images based on the UAV platform, and extensive experiments are implemented on both our self-constructed and public datasets. The results not only demonstrate a significant improvement in the performance of our GCMT compared to state-of-the-art methods, but also validate the efficacy of GCMT in harmonizing gradient alignment and of IEMF in enabling balanced multimodal fusion.
format Article
id doaj-art-4978b44bfdbd4fcaa455901aa8bb44b7
institution DOAJ
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-4978b44bfdbd4fcaa455901aa8bb44b72025-08-20T03:16:12ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118182401825010.1109/JSTARS.2025.358613211071999Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing ClassificationShihao Wang0https://orcid.org/0009-0000-4515-0959Zhengwei Xu1https://orcid.org/0000-0002-2645-9915Yun Lin2https://orcid.org/0000-0002-4002-1282Key Laboratory of Advanced Ship Communication and Information Technology, Ministry of Industry and Information Technology, College of Information and Communication Engineering, Harbin Engineering University, Harbin, ChinaKey Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, ChinaKey Laboratory of Advanced Ship Communication and Information Technology, Ministry of Industry and Information Technology, College of Information and Communication Engineering, Harbin Engineering University, Harbin, ChinaIn remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiveness of fusion is severely affected by modality imbalance caused by inconsistent gradient directions when integrating heterogeneous information. Existing methods predominantly focus on parameter tuning and gradient modulation, failing to resolve inherent conflicts from divergent modality optimization trajectories. To address these limitations, we propose a gradient-criterion multistage training (GCMT) framework, which systematically resolves gradient conflicts through an alternating freezing strategy, optimizing unimodal branches by evaluating consistency between unimodal and multimodal gradient directions. Building on the GCMT, we further introduce an information entropy measurement fusion (IEMF) module, which dynamically adjusts cross-modal feature fusion weights using entropy-based metrics to mitigate overreliance on dominant modalities while preserving synergistic interactions. We build a multimodal dataset of signals and images based on the UAV platform, and extensive experiments are implemented on both our self-constructed and public datasets. The results not only demonstrate a significant improvement in the performance of our GCMT compared to state-of-the-art methods, but also validate the efficacy of GCMT in harmonizing gradient alignment and of IEMF in enabling balanced multimodal fusion.https://ieeexplore.ieee.org/document/11071999/Gradient conflictmodal fusionmodality imbalancemultimodal learning
spellingShingle Shihao Wang
Zhengwei Xu
Yun Lin
Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Gradient conflict
modal fusion
modality imbalance
multimodal learning
title Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
title_full Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
title_fullStr Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
title_full_unstemmed Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
title_short Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
title_sort multistage training and fusion method for imbalanced multimodal uav remote sensing classification
topic Gradient conflict
modal fusion
modality imbalance
multimodal learning
url https://ieeexplore.ieee.org/document/11071999/
work_keys_str_mv AT shihaowang multistagetrainingandfusionmethodforimbalancedmultimodaluavremotesensingclassification
AT zhengweixu multistagetrainingandfusionmethodforimbalancedmultimodaluavremotesensingclassification
AT yunlin multistagetrainingandfusionmethodforimbalancedmultimodaluavremotesensingclassification