Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation
In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference spe...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10839278/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832576743598194688 |
---|---|
author | Kang Zheng Yu Chen Jingrong Wang Zhifei Liu Shuai Bao Jiao Zhan Nan Shen |
author_facet | Kang Zheng Yu Chen Jingrong Wang Zhifei Liu Shuai Bao Jiao Zhan Nan Shen |
author_sort | Kang Zheng |
collection | DOAJ |
description | In semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization. |
format | Article |
id | doaj-art-2d34a1aeba5f4fbf83793d36e2753fcd |
institution | Kabale University |
issn | 1939-1404 2151-1535 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj-art-2d34a1aeba5f4fbf83793d36e2753fcd2025-01-31T00:00:24ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184074409210.1109/JSTARS.2025.352563410839278Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge DistillationKang Zheng0https://orcid.org/0000-0002-4811-0001Yu Chen1https://orcid.org/0000-0002-2825-6767Jingrong Wang2https://orcid.org/0000-0002-4226-5803Zhifei Liu3https://orcid.org/0009-0009-4707-9346Shuai Bao4https://orcid.org/0000-0003-2275-0147Jiao Zhan5https://orcid.org/0009-0003-8519-2207Nan Shen6https://orcid.org/0000-0001-8468-5550College of Geography and Environment Science, Henan University, Kaifeng, ChinaSchool of Geomatics and Technology, Nanjing Tech University, Nanjing, ChinaGNSS Research Center, Wuhan University, Wuhan, ChinaDepartment of Aerospace and Geodesy, Technical University of Munich, Munich, GermanyChina and Research Center of Geospatial Big Data Application, Chinese Academy of Surveying and Mapping, Beijing, ChinaGNSS Research Center, Wuhan University, Wuhan, ChinaSchool of Geomatics and Technology, Nanjing Tech University, Nanjing, ChinaIn semantic segmentation tasks, the transition from convolutional neural networks (CNNs) to transformers is driven by the latter's superior ability to capture global semantic information in remote sensing images. However, most transformer methods face challenges such as slow inference speed and limitations in capturing local features. To address these issues, this study designs a hybrid approach that integrates knowledge distillation with a combination of CNN and transformer to enhance semantic segmentation in remote sensing images. First, this article proposes the dual-path convolutional transformer network (DP-CTNet) with a dual-path structure to leverage the strengths of both CNN and transformers. It incorporates a feature refinement module to optimize the transformer's feature learning, and a feature fusion module to effectively merge CNN and transformer features, preventing the insufficient learning of local features by the transformer. Then, DP-CTNet serves as the teacher model, and pruning and knowledge distillation are employed to create efficient DP-CTNet (EDP-CTNet) with superior segmentation speed and accuracy. Angle knowledge distillation (AKD) is proposed to enhance the feature migration learning of DP-CTNet during knowledge distillation, leading to improved EDP-CTNet performance. Experimental results demonstrate that DP-CTNet thoroughly combines the respective advantages of CNN and Transformer, maintaining local detail features while learning extensive sequential semantic information. EDP-CTNet not only delivers impressive segmentation speed but also exhibits excellent segmentation accuracy following AKD training. In comparison to other models, the two models proposed in this article notably distinguish themselves in terms of accuracy and result visualization.https://ieeexplore.ieee.org/document/10839278/Convolutional neural network (CNN)remote sensingsemantic segmentationtransformer |
spellingShingle | Kang Zheng Yu Chen Jingrong Wang Zhifei Liu Shuai Bao Jiao Zhan Nan Shen Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Convolutional neural network (CNN) remote sensing semantic segmentation transformer |
title | Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation |
title_full | Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation |
title_fullStr | Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation |
title_full_unstemmed | Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation |
title_short | Enhancing Remote Sensing Semantic Segmentation Accuracy and Efficiency Through Transformer and Knowledge Distillation |
title_sort | enhancing remote sensing semantic segmentation accuracy and efficiency through transformer and knowledge distillation |
topic | Convolutional neural network (CNN) remote sensing semantic segmentation transformer |
url | https://ieeexplore.ieee.org/document/10839278/ |
work_keys_str_mv | AT kangzheng enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT yuchen enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT jingrongwang enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT zhifeiliu enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT shuaibao enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT jiaozhan enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation AT nanshen enhancingremotesensingsemanticsegmentationaccuracyandefficiencythroughtransformerandknowledgedistillation |