CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal m...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/14/2481 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849252089630294016 |
|---|---|
| author | Jian Yan Xingfa Gu Yuxing Chen |
| author_facet | Jian Yan Xingfa Gu Yuxing Chen |
| author_sort | Jian Yan |
| collection | DOAJ |
| description | Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, we propose CropSTS, a remote sensing foundation model designed with a decoupled temporal–spatial attention architecture, specifically tailored for the temporal dynamics of cropland remote sensing data. To efficiently pre-train the model under limited labeled data, we employ a hybrid framework combining joint-embedding predictive architecture with knowledge distillation from web-scale foundation models. Despite being trained on a small dataset and using a compact model, CropSTS achieves state-of-the-art performance on the PASTIS-R benchmark in terms of mIoU and F1-score. Our results validate that structural optimization for temporal encoding and cross-modal knowledge transfer constitute effective strategies for advancing GFM design in agricultural remote sensing. |
| format | Article |
| id | doaj-art-70f5af9a706a43d1b45c999b09ec2678 |
| institution | Kabale University |
| issn | 2072-4292 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Remote Sensing |
| spelling | doaj-art-70f5af9a706a43d1b45c999b09ec26782025-08-20T03:56:45ZengMDPI AGRemote Sensing2072-42922025-07-011714248110.3390/rs17142481CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal AttentionJian Yan0Xingfa Gu1Yuxing Chen2Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaInstitute of Geodesy and Photogrammetry, ETH Zurich, 8092 Zurich, SwitzerlandRecent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, we propose CropSTS, a remote sensing foundation model designed with a decoupled temporal–spatial attention architecture, specifically tailored for the temporal dynamics of cropland remote sensing data. To efficiently pre-train the model under limited labeled data, we employ a hybrid framework combining joint-embedding predictive architecture with knowledge distillation from web-scale foundation models. Despite being trained on a small dataset and using a compact model, CropSTS achieves state-of-the-art performance on the PASTIS-R benchmark in terms of mIoU and F1-score. Our results validate that structural optimization for temporal encoding and cross-modal knowledge transfer constitute effective strategies for advancing GFM design in agricultural remote sensing.https://www.mdpi.com/2072-4292/17/14/2481remote sensingcropland classificationgeospatial foundation modelself-supervised learningtemporal–spatial attentionknowledge distillation |
| spellingShingle | Jian Yan Xingfa Gu Yuxing Chen CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention Remote Sensing remote sensing cropland classification geospatial foundation model self-supervised learning temporal–spatial attention knowledge distillation |
| title | CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention |
| title_full | CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention |
| title_fullStr | CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention |
| title_full_unstemmed | CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention |
| title_short | CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention |
| title_sort | cropsts a remote sensing foundation model for cropland classification with decoupled spatiotemporal attention |
| topic | remote sensing cropland classification geospatial foundation model self-supervised learning temporal–spatial attention knowledge distillation |
| url | https://www.mdpi.com/2072-4292/17/14/2481 |
| work_keys_str_mv | AT jianyan cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention AT xingfagu cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention AT yuxingchen cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention |