ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention
Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation mod...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/12/1966 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849432374188703744 |
|---|---|
| author | Yunan Qiu Bingjian Lu Wenrui Xiong Zhenyu Lu Le Sun Yingjie Cui |
| author_facet | Yunan Qiu Bingjian Lu Wenrui Xiong Zhenyu Lu Le Sun Yingjie Cui |
| author_sort | Yunan Qiu |
| collection | DOAJ |
| description | Weather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention (ViViT-Prob). The model takes historical sequences as input and initially maps them into a fixed-dimensional vector space through 3D convolutional patch encoding. Subsequently, a multi-head spatiotemporal fusion module with sparse attention encodes these vectors, effectively capturing spatiotemporal relationships between different regions in the sequences. The sparse constraint enables better utilization of data structural information, enhanced focus on critical regions, and reduced computational complexity. Finally, a parallel output decoder generates all time step predictions simultaneously, then maps back to the prediction space through a deconvolution module to reconstruct high-resolution images. Our experimental results on the Moving MNIST and real radar echo dataset demonstrate that the proposed model achieves superior performance in spatiotemporal sequence prediction and improves the prediction accuracy while maintaining structural consistency in radar echo extrapolation tasks, providing an effective solution for short-term precipitation forecasting. |
| format | Article |
| id | doaj-art-d19c5ba062674311a96f716dfe78f8a1 |
| institution | Kabale University |
| issn | 2072-4292 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Remote Sensing |
| spelling | doaj-art-d19c5ba062674311a96f716dfe78f8a12025-08-20T03:27:22ZengMDPI AGRemote Sensing2072-42922025-06-011712196610.3390/rs17121966ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse AttentionYunan Qiu0Bingjian Lu1Wenrui Xiong2Zhenyu Lu3Le Sun4Yingjie Cui5School of Information Engineering, Jiangsu Open University, Nanjing 210044, ChinaSchool of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaResearch and Development Department, Beijing Wenze Zhiyuan Information Technology Co., Beijing 100000, ChinaSchool of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaSchool of Electronics and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaWeather radar, as a crucial component of remote sensing data, plays a vital role in convective weather forecasting through radar echo extrapolation techniques. To address the limitations of existing deep learning methods in radar echo extrapolation, this paper proposes a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention (ViViT-Prob). The model takes historical sequences as input and initially maps them into a fixed-dimensional vector space through 3D convolutional patch encoding. Subsequently, a multi-head spatiotemporal fusion module with sparse attention encodes these vectors, effectively capturing spatiotemporal relationships between different regions in the sequences. The sparse constraint enables better utilization of data structural information, enhanced focus on critical regions, and reduced computational complexity. Finally, a parallel output decoder generates all time step predictions simultaneously, then maps back to the prediction space through a deconvolution module to reconstruct high-resolution images. Our experimental results on the Moving MNIST and real radar echo dataset demonstrate that the proposed model achieves superior performance in spatiotemporal sequence prediction and improves the prediction accuracy while maintaining structural consistency in radar echo extrapolation tasks, providing an effective solution for short-term precipitation forecasting.https://www.mdpi.com/2072-4292/17/12/1966radar echo extrapolationdeep learningvideo vision transformerspatiotemporal fusionshort-term precipitation forecast |
| spellingShingle | Yunan Qiu Bingjian Lu Wenrui Xiong Zhenyu Lu Le Sun Yingjie Cui ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention Remote Sensing radar echo extrapolation deep learning video vision transformer spatiotemporal fusion short-term precipitation forecast |
| title | ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention |
| title_full | ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention |
| title_fullStr | ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention |
| title_full_unstemmed | ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention |
| title_short | ViViT-Prob: A Radar Echo Extrapolation Model Based on Video Vision Transformer and Spatiotemporal Sparse Attention |
| title_sort | vivit prob a radar echo extrapolation model based on video vision transformer and spatiotemporal sparse attention |
| topic | radar echo extrapolation deep learning video vision transformer spatiotemporal fusion short-term precipitation forecast |
| url | https://www.mdpi.com/2072-4292/17/12/1966 |
| work_keys_str_mv | AT yunanqiu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention AT bingjianlu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention AT wenruixiong vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention AT zhenyulu vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention AT lesun vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention AT yingjiecui vivitprobaradarechoextrapolationmodelbasedonvideovisiontransformerandspatiotemporalsparseattention |