GHMSA-Net: Gated Hierarchical Multi-Scale Self-Attention for Perceptually-Guided AV1 Post-Processing
The AOMedia Video 1 (AV1) codec achieves excellent compression efficiency but often introduces visually distracting artifacts at high quantization parameters (QPs), impairing perceptual quality. We propose Gated Hierarchical Multi-Scale Attention Network (GHMSA-Net), a post-processing model that lev...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11114914/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The AOMedia Video 1 (AV1) codec achieves excellent compression efficiency but often introduces visually distracting artifacts at high quantization parameters (QPs), impairing perceptual quality. We propose Gated Hierarchical Multi-Scale Attention Network (GHMSA-Net), a post-processing model that leverages multi-scale self-attention and dynamic gating to adaptively suppress compression artifacts across varying quantization levels while preserving structural fidelity. The network architecture captures both fine-grained details and global context through a hierarchical attention design, enabling robust restoration under diverse compression strengths. We also explore an efficient training scheme that combines unified pretraining on a representative QP with lightweight QP-specific fine-tuning, offering a favorable trade-off between performance and training cost. Results show that, relative to the AV1 anchor, GHMSA-Net achieves BD-rate savings of 11.79% (Y), 21.24% (Cb), and 20.11% (Cr) for BD-PSNR; 10.55% (Y), 22.49% (Cb), and 21.44% (Cr) for BD-MS-SSIM; and 15.44% for BD-VMAF across QPs 20, 32, 43, 55, and 63. Visual assessments validate the model’s effectiveness in artifact removal and perceptual quality enhancement. |
|---|---|
| ISSN: | 2169-3536 |