GHMSA-Net: Gated Hierarchical Multi-Scale Self-Attention for Perceptually-Guided AV1 Post-Processing

The AOMedia Video 1 (AV1) codec achieves excellent compression efficiency but often introduces visually distracting artifacts at high quantization parameters (QPs), impairing perceptual quality. We propose Gated Hierarchical Multi-Scale Attention Network (GHMSA-Net), a post-processing model that lev...

Full description

Saved in:
Bibliographic Details
Main Authors: Bopu Zhao, Woowoen Gwun, Kiho Choi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11114914/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The AOMedia Video 1 (AV1) codec achieves excellent compression efficiency but often introduces visually distracting artifacts at high quantization parameters (QPs), impairing perceptual quality. We propose Gated Hierarchical Multi-Scale Attention Network (GHMSA-Net), a post-processing model that leverages multi-scale self-attention and dynamic gating to adaptively suppress compression artifacts across varying quantization levels while preserving structural fidelity. The network architecture captures both fine-grained details and global context through a hierarchical attention design, enabling robust restoration under diverse compression strengths. We also explore an efficient training scheme that combines unified pretraining on a representative QP with lightweight QP-specific fine-tuning, offering a favorable trade-off between performance and training cost. Results show that, relative to the AV1 anchor, GHMSA-Net achieves BD-rate savings of 11.79% (Y), 21.24% (Cb), and 20.11% (Cr) for BD-PSNR; 10.55% (Y), 22.49% (Cb), and 21.44% (Cr) for BD-MS-SSIM; and 15.44% for BD-VMAF across QPs 20, 32, 43, 55, and 63. Visual assessments validate the model’s effectiveness in artifact removal and perceptual quality enhancement.
ISSN:2169-3536