ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention

Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation mod...

Full description

Saved in:
Bibliographic Details
Main Authors: Yunan Qiu, Bingjian Lu, Wenrui Xiong, Zhenyu Lu, Le Sun, Yingjie Cui
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/12/1966
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432374188703744
author Yunan Qiu
Bingjian Lu
Wenrui Xiong
Zhenyu Lu
Le Sun
Yingjie Cui
author_facet Yunan Qiu
Bingjian Lu
Wenrui Xiong
Zhenyu Lu
Le Sun
Yingjie Cui
author_sort Yunan Qiu
collection DOAJ
description Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention (ViViT-Prob). The model takes historical sequences as input and initially maps them into a fixed-dimensional vector space through 3D convolutional patch encoding. Subsequently, a multi-head spatiotemporal fusion module with sparse attention encodes these vectors, effectively capturing spatiotemporal relationships between different regions in the sequences. The sparse constraint enables better utilization of data structural information, enhanced focus on critical regions, and reduced computational complexity. Finally, a parallel output decoder generates all time step predictions simultaneously, then maps back to the prediction space through a deconvolution module to reconstruct high-resolution images. Our experimental results on the Moving MNIST and real radar echo dataset demonstrate that the proposed model achieves superior performance in spatiotemporal sequence prediction and improves the prediction accuracy while maintaining structural consistency in radar echo extrapolation tasks, providing an effective solution for short-term precipitation forecasting.
format Article
id doaj-art-d19c5ba062674311a96f716dfe78f8a1
institution Kabale University
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-d19c5ba062674311a96f716dfe78f8a12025-08-20T03:27:22ZengMDPI AGRemote Sensing2072-42922025-06-011712196610.3390/rs17121966ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse AttentionYunan Qiu0Bingjian Lu1Wenrui Xiong2Zhenyu Lu3Le Sun4Yingjie Cui5School of Information Engineering, Jiangsu Open University, Nanjing 210044, ChinaSchool of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaResearch and Development Department, Beijing Wenze Zhiyuan Information Technology Co., Beijing 100000, ChinaSchool of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaWeather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention (ViViT-Prob). The model takes historical sequences as input and initially maps them into a fixed-dimensional vector space through 3D convolutional patch encoding. Subsequently, a multi-head spatiotemporal fusion module with sparse attention encodes these vectors, effectively capturing spatiotemporal relationships between different regions in the sequences. The sparse constraint enables better utilization of data structural information, enhanced focus on critical regions, and reduced computational complexity. Finally, a parallel output decoder generates all time step predictions simultaneously, then maps back to the prediction space through a deconvolution module to reconstruct high-resolution images. Our experimental results on the Moving MNIST and real radar echo dataset demonstrate that the proposed model achieves superior performance in spatiotemporal sequence prediction and improves the prediction accuracy while maintaining structural consistency in radar echo extrapolation tasks, providing an effective solution for short-term precipitation forecasting.https://www.mdpi.com/2072-4292/17/12/1966radar echo extrapolationdeep learningvideo vision transformerspatiotemporal fusionshort-term precipitation forecast
spellingShingle Yunan Qiu
Bingjian Lu
Wenrui Xiong
Zhenyu Lu
Le Sun
Yingjie Cui
ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
Remote Sensing
radar echo extrapolation
deep learning
video vision transformer
spatiotemporal fusion
short-term precipitation forecast
title ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
title_full ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
title_fullStr ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
title_full_unstemmed ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
title_short ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
title_sort vivit prob a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention
topic radar echo extrapolation
deep learning
video vision transformer
spatiotemporal fusion
short-term precipitation forecast
url https://www.mdpi.com/2072-4292/17/12/1966
work_keys_str_mv AT yunanqiu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention
AT bingjianlu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention
AT wenruixiong vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention
AT zhenyulu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention
AT lesun vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention
AT yingjiecui vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention