Screen shooting resistant watermarking based on cross attention

Abstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant...

Full description

Saved in:
Bibliographic Details
Main Authors: Lianshan Liu, Peng Xu, Qianwen Xue
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-00912-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850132750641659904
author Lianshan Liu
Peng Xu
Qianwen Xue
author_facet Lianshan Liu
Peng Xu
Qianwen Xue
author_sort Lianshan Liu
collection DOAJ
description Abstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant Watermarking (SSRW) has attracted a lot of attention. Most existing solutions are based on Convolutional Neural Networks (CNNs) for the embedding of watermarks. However, due to the limited reception field of CNNs, they are proficient in extracting local features but cannot understand the entire image. This paper presents a new watermarking system that is resistant to screen recording, with multi-head and cross-attention to incorporate watermarks, replacing the encoder in the end-to-end architecture. Specifically, we segment the image and watermark into smaller patches for positional embedding. Afterward, we calculate the attention scores through multi-head attention layers and generate the encoded image through concatenation. This approach increases the model’s ability to comprehend the entire image, thereby increasing performance. In addition, we enhance the U-Net network structure to replace the end-to-end decoder. The experimental results demonstrate that the proposed method not only reaches more than 95% accuracy in different capture scenarios but also excels in terms of reliability and invisibility relative to current state-of-the-art (SOTA) methods. In addition, this approach yields impressive PSNR and SSIM average values of 41.90 dB and 0.99, showing the excellent visual quality and reliability of the watermarked images.
format Article
id doaj-art-5e2ef0b81fb84092881fbcf04e73b547
institution OA Journals
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-5e2ef0b81fb84092881fbcf04e73b5472025-08-20T02:32:08ZengNature PortfolioScientific Reports2045-23222025-05-0115111510.1038/s41598-025-00912-8Screen shooting resistant watermarking based on cross attentionLianshan Liu0Peng Xu1Qianwen Xue2College of Computer Science and Engineering, Shandong University of Science and TechnologyCollege of Computer Science and Engineering, Shandong University of Science and TechnologyQingdao Maternal & Child Health and Family Planning Service CenterAbstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant Watermarking (SSRW) has attracted a lot of attention. Most existing solutions are based on Convolutional Neural Networks (CNNs) for the embedding of watermarks. However, due to the limited reception field of CNNs, they are proficient in extracting local features but cannot understand the entire image. This paper presents a new watermarking system that is resistant to screen recording, with multi-head and cross-attention to incorporate watermarks, replacing the encoder in the end-to-end architecture. Specifically, we segment the image and watermark into smaller patches for positional embedding. Afterward, we calculate the attention scores through multi-head attention layers and generate the encoded image through concatenation. This approach increases the model’s ability to comprehend the entire image, thereby increasing performance. In addition, we enhance the U-Net network structure to replace the end-to-end decoder. The experimental results demonstrate that the proposed method not only reaches more than 95% accuracy in different capture scenarios but also excels in terms of reliability and invisibility relative to current state-of-the-art (SOTA) methods. In addition, this approach yields impressive PSNR and SSIM average values of 41.90 dB and 0.99, showing the excellent visual quality and reliability of the watermarked images.https://doi.org/10.1038/s41598-025-00912-8Robust watermarkingScreen-shootingDeep learningCross attention
spellingShingle Lianshan Liu
Peng Xu
Qianwen Xue
Screen shooting resistant watermarking based on cross attention
Scientific Reports
Robust watermarking
Screen-shooting
Deep learning
Cross attention
title Screen shooting resistant watermarking based on cross attention
title_full Screen shooting resistant watermarking based on cross attention
title_fullStr Screen shooting resistant watermarking based on cross attention
title_full_unstemmed Screen shooting resistant watermarking based on cross attention
title_short Screen shooting resistant watermarking based on cross attention
title_sort screen shooting resistant watermarking based on cross attention
topic Robust watermarking
Screen-shooting
Deep learning
Cross attention
url https://doi.org/10.1038/s41598-025-00912-8
work_keys_str_mv AT lianshanliu screenshootingresistantwatermarkingbasedoncrossattention
AT pengxu screenshootingresistantwatermarkingbasedoncrossattention
AT qianwenxue screenshootingresistantwatermarkingbasedoncrossattention