Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance
This paper presents MS-MTSA, a multi-scale multi-type self-attention network designed to enhance AV1-compressed video through targeted post-filtering. The objective is to address two persistent artifact issues observed in our previous MTSA model: visible seams at patch boundaries and grid-like disto...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/11/1782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849330657114718208 |
|---|---|
| author | Woowoen Gwun Kiho Choi Gwang Hoon Park |
| author_facet | Woowoen Gwun Kiho Choi Gwang Hoon Park |
| author_sort | Woowoen Gwun |
| collection | DOAJ |
| description | This paper presents MS-MTSA, a multi-scale multi-type self-attention network designed to enhance AV1-compressed video through targeted post-filtering. The objective is to address two persistent artifact issues observed in our previous MTSA model: visible seams at patch boundaries and grid-like distortions from upsampling. To this end, MS-MTSA introduces two key architectural enhancements. First, multi-scale block-wise self-attention applies sequential attention over 16 × 16 and 12 × 12 blocks to better capture local context and improve spatial continuity. Second, refined patch-wise self-attention includes a lightweight convolutional refinement layer after upsampling to suppress structured artifacts in flat regions. These targeted modifications significantly improve both perceptual and quantitative quality. The proposed network achieves BD-rate reductions of 12.44% for Y, 21.70% for Cb, and 19.90% for Cr compared to the AV1 anchor. Visual evaluations confirm improved texture fidelity and reduced seam artifacts, demonstrating the effectiveness of combining multi-scale attention and structural refinement for artifact suppression in compressed video. |
| format | Article |
| id | doaj-art-571e620b2f544f6db8593c98e418efa7 |
| institution | Kabale University |
| issn | 2227-7390 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Mathematics |
| spelling | doaj-art-571e620b2f544f6db8593c98e418efa72025-08-20T03:46:50ZengMDPI AGMathematics2227-73902025-05-011311178210.3390/math13111782Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding PerformanceWoowoen Gwun0Kiho Choi1Gwang Hoon Park2Department of Computer Science and Engineering, College of Software, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of KoreaDepartment of Electronics Engineering, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of KoreaDepartment of Computer Science and Engineering, College of Software, Kyung Hee University, Yongin 17104, Gyeonggi-do, Republic of KoreaThis paper presents MS-MTSA, a multi-scale multi-type self-attention network designed to enhance AV1-compressed video through targeted post-filtering. The objective is to address two persistent artifact issues observed in our previous MTSA model: visible seams at patch boundaries and grid-like distortions from upsampling. To this end, MS-MTSA introduces two key architectural enhancements. First, multi-scale block-wise self-attention applies sequential attention over 16 × 16 and 12 × 12 blocks to better capture local context and improve spatial continuity. Second, refined patch-wise self-attention includes a lightweight convolutional refinement layer after upsampling to suppress structured artifacts in flat regions. These targeted modifications significantly improve both perceptual and quantitative quality. The proposed network achieves BD-rate reductions of 12.44% for Y, 21.70% for Cb, and 19.90% for Cr compared to the AV1 anchor. Visual evaluations confirm improved texture fidelity and reduced seam artifacts, demonstrating the effectiveness of combining multi-scale attention and structural refinement for artifact suppression in compressed video.https://www.mdpi.com/2227-7390/13/11/1782video compressionAV1self-attentionCNN |
| spellingShingle | Woowoen Gwun Kiho Choi Gwang Hoon Park Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance Mathematics video compression AV1 self-attention CNN |
| title | Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance |
| title_full | Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance |
| title_fullStr | Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance |
| title_full_unstemmed | Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance |
| title_short | Multi-Scale Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec: Towards Enhanced Visual Quality and Overall Coding Performance |
| title_sort | multi scale self attention based convolutional neural network post filtering for av1 codec towards enhanced visual quality and overall coding performance |
| topic | video compression AV1 self-attention CNN |
| url | https://www.mdpi.com/2227-7390/13/11/1782 |
| work_keys_str_mv | AT woowoengwun multiscaleselfattentionbasedconvolutionalneuralnetworkpostfilteringforav1codectowardsenhancedvisualqualityandoverallcodingperformance AT kihochoi multiscaleselfattentionbasedconvolutionalneuralnetworkpostfilteringforav1codectowardsenhancedvisualqualityandoverallcodingperformance AT gwanghoonpark multiscaleselfattentionbasedconvolutionalneuralnetworkpostfilteringforav1codectowardsenhancedvisualqualityandoverallcodingperformance |