Screen shooting resistant watermarking based on cross attention

Abstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lianshan Liu, Peng Xu, Qianwen Xue
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Scientific Reports
Subjects:	Robust watermarking Screen-shooting Deep learning Cross attention
Online Access:	https://doi.org/10.1038/s41598-025-00912-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850132750641659904
author	Lianshan Liu Peng Xu Qianwen Xue
author_facet	Lianshan Liu Peng Xu Qianwen Xue
author_sort	Lianshan Liu
collection	DOAJ
description	Abstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant Watermarking (SSRW) has attracted a lot of attention. Most existing solutions are based on Convolutional Neural Networks (CNNs) for the embedding of watermarks. However, due to the limited reception field of CNNs, they are proficient in extracting local features but cannot understand the entire image. This paper presents a new watermarking system that is resistant to screen recording, with multi-head and cross-attention to incorporate watermarks, replacing the encoder in the end-to-end architecture. Specifically, we segment the image and watermark into smaller patches for positional embedding. Afterward, we calculate the attention scores through multi-head attention layers and generate the encoded image through concatenation. This approach increases the model’s ability to comprehend the entire image, thereby increasing performance. In addition, we enhance the U-Net network structure to replace the end-to-end decoder. The experimental results demonstrate that the proposed method not only reaches more than 95% accuracy in different capture scenarios but also excels in terms of reliability and invisibility relative to current state-of-the-art (SOTA) methods. In addition, this approach yields impressive PSNR and SSIM average values of 41.90 dB and 0.99, showing the excellent visual quality and reliability of the watermarked images.
format	Article
id	doaj-art-5e2ef0b81fb84092881fbcf04e73b547
institution	OA Journals
issn	2045-2322
language	English
publishDate	2025-05-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-5e2ef0b81fb84092881fbcf04e73b5472025-08-20T02:32:08ZengNature PortfolioScientific Reports2045-23222025-05-0115111510.1038/s41598-025-00912-8Screen shooting resistant watermarking based on cross attentionLianshan Liu0Peng Xu1Qianwen Xue2College of Computer Science and Engineering, Shandong University of Science and TechnologyCollege of Computer Science and Engineering, Shandong University of Science and TechnologyQingdao Maternal & Child Health and Family Planning Service CenterAbstract With the development of digital imaging devices, the process of recording sensitive information displayed on screens through mobile phones and cameras has become a prominent technique for modern data leaks. In order to identify the origin of information violations, Screen-Shooting Resistant Watermarking (SSRW) has attracted a lot of attention. Most existing solutions are based on Convolutional Neural Networks (CNNs) for the embedding of watermarks. However, due to the limited reception field of CNNs, they are proficient in extracting local features but cannot understand the entire image. This paper presents a new watermarking system that is resistant to screen recording, with multi-head and cross-attention to incorporate watermarks, replacing the encoder in the end-to-end architecture. Specifically, we segment the image and watermark into smaller patches for positional embedding. Afterward, we calculate the attention scores through multi-head attention layers and generate the encoded image through concatenation. This approach increases the model’s ability to comprehend the entire image, thereby increasing performance. In addition, we enhance the U-Net network structure to replace the end-to-end decoder. The experimental results demonstrate that the proposed method not only reaches more than 95% accuracy in different capture scenarios but also excels in terms of reliability and invisibility relative to current state-of-the-art (SOTA) methods. In addition, this approach yields impressive PSNR and SSIM average values of 41.90 dB and 0.99, showing the excellent visual quality and reliability of the watermarked images.https://doi.org/10.1038/s41598-025-00912-8Robust watermarkingScreen-shootingDeep learningCross attention
spellingShingle	Lianshan Liu Peng Xu Qianwen Xue Screen shooting resistant watermarking based on cross attention Scientific Reports Robust watermarking Screen-shooting Deep learning Cross attention
title	Screen shooting resistant watermarking based on cross attention
title_full	Screen shooting resistant watermarking based on cross attention
title_fullStr	Screen shooting resistant watermarking based on cross attention
title_full_unstemmed	Screen shooting resistant watermarking based on cross attention
title_short	Screen shooting resistant watermarking based on cross attention
title_sort	screen shooting resistant watermarking based on cross attention
topic	Robust watermarking Screen-shooting Deep learning Cross attention
url	https://doi.org/10.1038/s41598-025-00912-8
work_keys_str_mv	AT lianshanliu screenshootingresistantwatermarkingbasedoncrossattention AT pengxu screenshootingresistantwatermarkingbasedoncrossattention AT qianwenxue screenshootingresistantwatermarkingbasedoncrossattention

Screen shooting resistant watermarking based on cross attention

Similar Items