Spatial-Temporal Sequence Attention Based Efficient Transformer for Video Snow Removal
Video snow removal has tremendous potential in enhancing video quality and boosting the performance of computer vision tasks. Recently, Transformers have gained attention for the self-attention mechanism. However, the memory consumption of self-attention is considerable, limiting its application in...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tsinghua University Press
2025-05-01
|
| Series: | Big Data Mining and Analytics |
| Subjects: | |
| Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020061 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Video snow removal has tremendous potential in enhancing video quality and boosting the performance of computer vision tasks. Recently, Transformers have gained attention for the self-attention mechanism. However, the memory consumption of self-attention is considerable, limiting its application in high-resolution video restoration. In this paper, we propose an efficient video desnowing spatio-temporal Transformer, which utilizes spatio-temporal sequence attention to parallelly capture intra-frame spatial information and inter-frame temporal information, with much lower memory consumption compared to standard self-attention. Additionally, we mitigate the impact of snowflake occlusion on video frame alignment by leveraging an atmospheric scattering model. Furthermore, we introduce the concept of Neural Representation for Videos (NeRV) and effectively reconstruct compressed videos after multi-resolution feature extraction using the recovery NeRV module, thereby further reducing computational consumption. Extensive experiments demonstrate that the model achieves superior performance in video snow removal while significantly reducing computational resources. |
|---|---|
| ISSN: | 2096-0654 2097-406X |