TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection

In remote sensing target detection cases, great challenges are faced when migrating detection models from the visible domain to the infrared domain. Cross-domain migration suffers from problems such as a lack of data annotations in the infrared domain and interdomain feature differences. To improve...

Full description

Saved in:

Bibliographic Details
Main Authors:	Siyu Wang, Xiaogang Yang, Ruitao Lu, Shuang Su, Bin Tang, Tao Zhang, Zhengjie Zhu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Distillation training object detection remote sensing unsupervised domain adaptation (UDA)
Online Access:	https://ieeexplore.ieee.org/document/10836742/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832576798411456512
author	Siyu Wang Xiaogang Yang Ruitao Lu Shuang Su Bin Tang Tao Zhang Zhengjie Zhu
author_facet	Siyu Wang Xiaogang Yang Ruitao Lu Shuang Su Bin Tang Tao Zhang Zhengjie Zhu
author_sort	Siyu Wang
collection	DOAJ
description	In remote sensing target detection cases, great challenges are faced when migrating detection models from the visible domain to the infrared domain. Cross-domain migration suffers from problems such as a lack of data annotations in the infrared domain and interdomain feature differences. To improve the detection accuracy attained for infrared images, we propose a novel two-phase distillation training network (TPDTNet). Specifically, in the first phase, we incorporate a contrastive learning framework to maximize the mutual information between the source and target domains. In addition, we construct a generative model that learns only a unidirectional modality conversion mapping, thereby capturing the associations between their visual contents. The source-domain image is converted to an image with the style of the target domain, thereby achieving image-level domain alignment. The generated image is combined with the source-domain image to form an enhanced domain for cross-modal training. Enhanced domain data are fed into the teacher network to initialize the weights and produce pseudolabels. Next, to address small remote sensing target detection tasks, we construct a multidimensional progressive feature fusion detection framework, which initially fuses two adjacent low-level feature maps and then progressively incorporates high-level features to enhance the quality of fusing nonadjacent layer features. Subsequently, a spatial-dimension convolution is integrated into the backbone network. This convolutional operation is embedded following standard convolution to mitigate the loss of detailed features. Finally, a distillation training strategy that utilizes pseudodetection labels to calculate target information. By minimizing the Kullback–Leibler divergence between the probability maps of the teacher and student networks, the channel activations are transformed into probability distributions, thereby achieving knowledge distillation. The training weights are transferred from the teacher network to the student network to maximize the detection accuracy. Extensive experiments are conducted on three optical-to-infrared datasets, and the experimental results show that our TPDTNet method achieves state-of-the-art results relative to those of the baseline model.
format	Article
id	doaj-art-f7f2485f95d44ddfabeb458dbf1b6acf
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-f7f2485f95d44ddfabeb458dbf1b6acf2025-01-31T00:00:26ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184255427210.1109/JSTARS.2025.352805710836742TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object DetectionSiyu Wang0https://orcid.org/0000-0002-7003-9344Xiaogang Yang1https://orcid.org/0000-0002-4419-1334Ruitao Lu2https://orcid.org/0000-0002-7527-4298Shuang Su3https://orcid.org/0000-0002-5412-280XBin Tang4https://orcid.org/0009-0006-0429-9793Tao Zhang5https://orcid.org/0000-0002-4053-9209Zhengjie Zhu6https://orcid.org/0000-0003-2649-9367College of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaCollege of Missile Engineering, Rocket Force University of Engineering, Xi'an, ChinaIn remote sensing target detection cases, great challenges are faced when migrating detection models from the visible domain to the infrared domain. Cross-domain migration suffers from problems such as a lack of data annotations in the infrared domain and interdomain feature differences. To improve the detection accuracy attained for infrared images, we propose a novel two-phase distillation training network (TPDTNet). Specifically, in the first phase, we incorporate a contrastive learning framework to maximize the mutual information between the source and target domains. In addition, we construct a generative model that learns only a unidirectional modality conversion mapping, thereby capturing the associations between their visual contents. The source-domain image is converted to an image with the style of the target domain, thereby achieving image-level domain alignment. The generated image is combined with the source-domain image to form an enhanced domain for cross-modal training. Enhanced domain data are fed into the teacher network to initialize the weights and produce pseudolabels. Next, to address small remote sensing target detection tasks, we construct a multidimensional progressive feature fusion detection framework, which initially fuses two adjacent low-level feature maps and then progressively incorporates high-level features to enhance the quality of fusing nonadjacent layer features. Subsequently, a spatial-dimension convolution is integrated into the backbone network. This convolutional operation is embedded following standard convolution to mitigate the loss of detailed features. Finally, a distillation training strategy that utilizes pseudodetection labels to calculate target information. By minimizing the Kullback–Leibler divergence between the probability maps of the teacher and student networks, the channel activations are transformed into probability distributions, thereby achieving knowledge distillation. The training weights are transferred from the teacher network to the student network to maximize the detection accuracy. Extensive experiments are conducted on three optical-to-infrared datasets, and the experimental results show that our TPDTNet method achieves state-of-the-art results relative to those of the baseline model.https://ieeexplore.ieee.org/document/10836742/Distillation trainingobject detectionremote sensingunsupervised domain adaptation (UDA)
spellingShingle	Siyu Wang Xiaogang Yang Ruitao Lu Shuang Su Bin Tang Tao Zhang Zhengjie Zhu TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Distillation training object detection remote sensing unsupervised domain adaptation (UDA)
title	TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection
title_full	TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection
title_fullStr	TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection
title_full_unstemmed	TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection
title_short	TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection
title_sort	tpdtnet two phase distillation training for visible to infrared unsupervised domain adaptive object detection
topic	Distillation training object detection remote sensing unsupervised domain adaptation (UDA)
url	https://ieeexplore.ieee.org/document/10836742/
work_keys_str_mv	AT siyuwang tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT xiaogangyang tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT ruitaolu tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT shuangsu tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT bintang tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT taozhang tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection AT zhengjiezhu tpdtnettwophasedistillationtrainingforvisibletoinfraredunsuperviseddomainadaptiveobjectdetection

TPDTNet: Two-Phase Distillation Training for Visible-to-Infrared Unsupervised Domain Adaptive Object Detection

Similar Items