GeoViT: Mixed-Scale Transformer for Perspective Correction in Print-Cam Image Watermarking

Printed identity documents, such as ID cards and passports, continue to play a vital role in identity verification, despite the growing adoption of digital authentication methods. The print-cam process, which involves printing a watermarked image and capturing it with a smartphone camera, provides a...

Full description

Saved in:
Bibliographic Details
Main Authors: Said Boujerfaoui, Anass Mancour-Billah, Hassan Douzi, Rachid Harba
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11020641/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Printed identity documents, such as ID cards and passports, continue to play a vital role in identity verification, despite the growing adoption of digital authentication methods. The print-cam process, which involves printing a watermarked image and capturing it with a smartphone camera, provides a practical approach to the authentication of mobile-based documents. However, this process introduces challenges such as perspective distortions, compression artifacts, noise, and lighting variations, making accurate watermark detection difficult. Existing distortion correction techniques often struggle to fully address these issues, especially in practical scenarios where handheld camera use is common and conditions are less controlled. In this study, we propose GeoViT, a Transformer-based framework that enhances watermark robustness against print-cam attacks. GeoViT utilizes a multi-head attention mechanism to capture global dependencies and spatial variations, improving feature extraction for distortion rectification. To address the limitations of the naive feed-forward network in Transformers for multi-scale information, we introduce a mixed-scale feed-forward network, which generates robust features for geometric alignment. Additionally, we incorporate a mixture of expert feature compensators, integrating local context from CNN-based operators to refine distortion correction. Our method significantly outperforms existing approaches in geometric accuracy, visual fidelity, and perceptual quality. Extensive experiments on a diverse set of ID images captured under various conditions with different smartphone models demonstrate that GeoViT significantly improves watermark robustness. These results highlight GeoViT’s effectiveness as a secure and efficient solution for mobile-based identity document authentication, advancing the development of watermarking techniques for real-time, smartphone-compatible systems.
ISSN:2169-3536