Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis

Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forens...

Full description

Saved in:
Bibliographic Details
Main Authors: Potito Valle Dell’Olmo, Oleksandr Kuznetsov, Emanuele Frontoni, Marco Arnesano, Christian Napoli, Cristian Randieri
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/7/2/54
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432054358343680
author Potito Valle Dell’Olmo
Oleksandr Kuznetsov
Emanuele Frontoni
Marco Arnesano
Christian Napoli
Cristian Randieri
author_facet Potito Valle Dell’Olmo
Oleksandr Kuznetsov
Emanuele Frontoni
Marco Arnesano
Christian Napoli
Cristian Randieri
author_sort Potito Valle Dell’Olmo
collection DOAJ
description Convolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forensics, underlining the importance of ensuring the integrity of digital visual content. In this study, we present a systematic evaluation of the performance of a convolutional neural network (CNN) specifically designed for copy-move manipulation detection, applied to three datasets widely used in the literature in the context of digital forensics: CoMoFoD, Coverage, and CASIA v2. Our experimental analysis highlighted a significant variability of the results, with an accuracy ranging from 95.90% on CoMoFoD to 27.50% on Coverage. This inhomogeneity has been attributed to specific structural factors of the datasets used, such as the sample size, the degree of imbalance between classes, and the intrinsic complexity of the manipulations. We also investigated different regularization techniques and data augmentation strategies to understand their impact on the network performance, finding that adopting the L2 penalty and reducing the learning rate led to an accuracy increase of up to 2.5% for CASIA v2, while on CoMoFoD we recorded a much more modest impact (1.3%). Similarly, we observed that data augmentation was able to improve performance on large datasets but was ineffective on smaller ones. Our results challenge the idea of universal generalizability of CNN architectures in the context of copy-move forgery detection, highlighting instead how performance is strictly dependent on the intrinsic characteristics of the dataset under consideration. Finally, we propose a series of operational recommendations for optimizing the training process, the choice of the dataset, and the definition of robust evaluation protocols aimed at guiding the development of detection systems that are more reliable and generalizable.
format Article
id doaj-art-2dbd2002ebad401e848fb14df9b7a99f
institution Kabale University
issn 2504-4990
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj-art-2dbd2002ebad401e848fb14df9b7a99f2025-08-20T03:27:28ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902025-06-01725410.3390/make7020054Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative AnalysisPotito Valle Dell’Olmo0Oleksandr Kuznetsov1Emanuele Frontoni2Marco Arnesano3Christian Napoli4Cristian Randieri5Department of Theoretical and Applied Sciences, eCampus University, Via Isimbardi 10, 22060 Novedrate, ItalyDepartment of Theoretical and Applied Sciences, eCampus University, Via Isimbardi 10, 22060 Novedrate, ItalyDepartment of Political Sciences, Communication and International Relations, University of Macerata, Via Crescimbeni, 30/32, 62100 Macerata, ItalyDepartment of Theoretical and Applied Sciences, eCampus University, Via Isimbardi 10, 22060 Novedrate, ItalyDepartment of Computer, Control, and Management Engineering “Antonio Ruberti”, Sapienza University of Rome, V. Ariosto 25, 00185 Rome, ItalyDepartment of Theoretical and Applied Sciences, eCampus University, Via Isimbardi 10, 22060 Novedrate, ItalyConvolutional neural networks (CNNs) have established themselves over time as a fundamental tool in the field of copy-move forgery detection due to their ability to effectively identify and analyze manipulated images. Unfortunately, they still represent a persistent challenge in digital image forensics, underlining the importance of ensuring the integrity of digital visual content. In this study, we present a systematic evaluation of the performance of a convolutional neural network (CNN) specifically designed for copy-move manipulation detection, applied to three datasets widely used in the literature in the context of digital forensics: CoMoFoD, Coverage, and CASIA v2. Our experimental analysis highlighted a significant variability of the results, with an accuracy ranging from 95.90% on CoMoFoD to 27.50% on Coverage. This inhomogeneity has been attributed to specific structural factors of the datasets used, such as the sample size, the degree of imbalance between classes, and the intrinsic complexity of the manipulations. We also investigated different regularization techniques and data augmentation strategies to understand their impact on the network performance, finding that adopting the L2 penalty and reducing the learning rate led to an accuracy increase of up to 2.5% for CASIA v2, while on CoMoFoD we recorded a much more modest impact (1.3%). Similarly, we observed that data augmentation was able to improve performance on large datasets but was ineffective on smaller ones. Our results challenge the idea of universal generalizability of CNN architectures in the context of copy-move forgery detection, highlighting instead how performance is strictly dependent on the intrinsic characteristics of the dataset under consideration. Finally, we propose a series of operational recommendations for optimizing the training process, the choice of the dataset, and the definition of robust evaluation protocols aimed at guiding the development of detection systems that are more reliable and generalizable.https://www.mdpi.com/2504-4990/7/2/54copy-move forgery detectionconvolutional neural networksdigital image forensicsdataset dependencyregularization techniquesdata augmentation
spellingShingle Potito Valle Dell’Olmo
Oleksandr Kuznetsov
Emanuele Frontoni
Marco Arnesano
Christian Napoli
Cristian Randieri
Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
Machine Learning and Knowledge Extraction
copy-move forgery detection
convolutional neural networks
digital image forensics
dataset dependency
regularization techniques
data augmentation
title Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
title_full Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
title_fullStr Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
title_full_unstemmed Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
title_short Dataset Dependency in CNN-Based Copy-Move Forgery Detection: A Multi-Dataset Comparative Analysis
title_sort dataset dependency in cnn based copy move forgery detection a multi dataset comparative analysis
topic copy-move forgery detection
convolutional neural networks
digital image forensics
dataset dependency
regularization techniques
data augmentation
url https://www.mdpi.com/2504-4990/7/2/54
work_keys_str_mv AT potitovalledellolmo datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis
AT oleksandrkuznetsov datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis
AT emanuelefrontoni datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis
AT marcoarnesano datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis
AT christiannapoli datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis
AT cristianrandieri datasetdependencyincnnbasedcopymoveforgerydetectionamultidatasetcomparativeanalysis