CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention

Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal m...

Full description

Saved in:
Bibliographic Details
Main Authors: Jian Yan, Xingfa Gu, Yuxing Chen
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/14/2481
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849252089630294016
author Jian Yan
Xingfa Gu
Yuxing Chen
author_facet Jian Yan
Xingfa Gu
Yuxing Chen
author_sort Jian Yan
collection DOAJ
description Recent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, we propose CropSTS, a remote sensing foundation model designed with a decoupled temporal–spatial attention architecture, specifically tailored for the temporal dynamics of cropland remote sensing data. To efficiently pre-train the model under limited labeled data, we employ a hybrid framework combining joint-embedding predictive architecture with knowledge distillation from web-scale foundation models. Despite being trained on a small dataset and using a compact model, CropSTS achieves state-of-the-art performance on the PASTIS-R benchmark in terms of mIoU and F1-score. Our results validate that structural optimization for temporal encoding and cross-modal knowledge transfer constitute effective strategies for advancing GFM design in agricultural remote sensing.
format Article
id doaj-art-70f5af9a706a43d1b45c999b09ec2678
institution Kabale University
issn 2072-4292
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-70f5af9a706a43d1b45c999b09ec26782025-08-20T03:56:45ZengMDPI AGRemote Sensing2072-42922025-07-011714248110.3390/rs17142481CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal AttentionJian Yan0Xingfa Gu1Yuxing Chen2Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, ChinaInstitute of Geodesy and Photogrammetry, ETH Zurich, 8092 Zurich, SwitzerlandRecent progress in geospatial foundation models (GFMs) has demonstrated strong generalization capabilities for remote sensing downstream tasks. However, existing GFMs still struggle with fine-grained cropland classification due to ambiguous field boundaries, insufficient and low-efficient temporal modeling, and limited cross-regional adaptability. In this paper, we propose CropSTS, a remote sensing foundation model designed with a decoupled temporal–spatial attention architecture, specifically tailored for the temporal dynamics of cropland remote sensing data. To efficiently pre-train the model under limited labeled data, we employ a hybrid framework combining joint-embedding predictive architecture with knowledge distillation from web-scale foundation models. Despite being trained on a small dataset and using a compact model, CropSTS achieves state-of-the-art performance on the PASTIS-R benchmark in terms of mIoU and F1-score. Our results validate that structural optimization for temporal encoding and cross-modal knowledge transfer constitute effective strategies for advancing GFM design in agricultural remote sensing.https://www.mdpi.com/2072-4292/17/14/2481remote sensingcropland classificationgeospatial foundation modelself-supervised learningtemporal–spatial attentionknowledge distillation
spellingShingle Jian Yan
Xingfa Gu
Yuxing Chen
CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
Remote Sensing
remote sensing
cropland classification
geospatial foundation model
self-supervised learning
temporal–spatial attention
knowledge distillation
title CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
title_full CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
title_fullStr CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
title_full_unstemmed CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
title_short CropSTS: A Remote Sensing Foundation Model for Cropland Classification with Decoupled Spatiotemporal Attention
title_sort cropsts a remote sensing foundation model for cropland classification with decoupled spatiotemporal attention
topic remote sensing
cropland classification
geospatial foundation model
self-supervised learning
temporal–spatial attention
knowledge distillation
url https://www.mdpi.com/2072-4292/17/14/2481
work_keys_str_mv AT jianyan cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention
AT xingfagu cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention
AT yuxingchen cropstsaremotesensingfoundationmodelforcroplandclassificationwithdecoupledspatiotemporalattention