Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images

Abstract Image forensic datasets need to accommodate a complex diversity of systematic noise and intrinsic image artefacts to prevent any overfitting of learning methods to a small set of camera types or manipulation techniques. Such artefacts are created during the image acquisition as well as the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Adam Novozámský, Babak Mahdian, Stanislav Saic
Format:	Article
Language:	English
Published:	Wiley 2021-07-01
Series:	IET Biometrics
Subjects:	cameras image classification image coding image colour analysis image processing image representation
Online Access:	https://doi.org/10.1049/bme2.12025
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832546711618191360
author	Adam Novozámský Babak Mahdian Stanislav Saic
author_facet	Adam Novozámský Babak Mahdian Stanislav Saic
author_sort	Adam Novozámský
collection	DOAJ
description	Abstract Image forensic datasets need to accommodate a complex diversity of systematic noise and intrinsic image artefacts to prevent any overfitting of learning methods to a small set of camera types or manipulation techniques. Such artefacts are created during the image acquisition as well as the manipulating process itself (e.g., noise due to sensors, interpolation artefacts, etc.). Here, the authors introduce three datasets. First, we identified the majority of camera models on the market. Then, we collected a dataset of 35,000 real images captured by these cameras. We also created the same number of digitally manipulated images. Additionally, we also collected a dataset of 2,000 ‘real‐life’ (uncontrolled) manipulated images. They are made by unknown people and downloaded from the Internet. The real versions of these images are also provided. We also manually created binary masks localising the exact manipulated areas of these images. Moreover, we captured a set of 2,759 real images formed by 32 unique cameras (19 different camera models) in a controlled way by ourselves. Here, the processing history of all images is guaranteed. This set includes categorised images of uniform areas as well as natural images that can be used effectively for analysis of the sensor noise.
format	Article
id	doaj-art-0258d30fe71348df9cdf1972c2f3d1b4
institution	Kabale University
issn	2047-4938 2047-4946
language	English
publishDate	2021-07-01
publisher	Wiley
record_format	Article
series	IET Biometrics
spelling	doaj-art-0258d30fe71348df9cdf1972c2f3d1b42025-02-03T06:47:18ZengWileyIET Biometrics2047-49382047-49462021-07-0110439240710.1049/bme2.12025Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated imagesAdam Novozámský0Babak Mahdian1Stanislav Saic2The Czech Academy of Sciences Institute of Information Theory and Automation Prague, CzechiaThe Czech Academy of Sciences Institute of Information Theory and Automation Prague, CzechiaThe Czech Academy of Sciences Institute of Information Theory and Automation Prague, CzechiaAbstract Image forensic datasets need to accommodate a complex diversity of systematic noise and intrinsic image artefacts to prevent any overfitting of learning methods to a small set of camera types or manipulation techniques. Such artefacts are created during the image acquisition as well as the manipulating process itself (e.g., noise due to sensors, interpolation artefacts, etc.). Here, the authors introduce three datasets. First, we identified the majority of camera models on the market. Then, we collected a dataset of 35,000 real images captured by these cameras. We also created the same number of digitally manipulated images. Additionally, we also collected a dataset of 2,000 ‘real‐life’ (uncontrolled) manipulated images. They are made by unknown people and downloaded from the Internet. The real versions of these images are also provided. We also manually created binary masks localising the exact manipulated areas of these images. Moreover, we captured a set of 2,759 real images formed by 32 unique cameras (19 different camera models) in a controlled way by ourselves. Here, the processing history of all images is guaranteed. This set includes categorised images of uniform areas as well as natural images that can be used effectively for analysis of the sensor noise.https://doi.org/10.1049/bme2.12025camerasimage classificationimage codingimage colour analysisimage processingimage representation
spellingShingle	Adam Novozámský Babak Mahdian Stanislav Saic Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images IET Biometrics cameras image classification image coding image colour analysis image processing image representation
title	Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
title_full	Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
title_fullStr	Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
title_full_unstemmed	Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
title_short	Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images
title_sort	extended imd2020 a large scale annotated dataset tailored for detecting manipulated images
topic	cameras image classification image coding image colour analysis image processing image representation
url	https://doi.org/10.1049/bme2.12025
work_keys_str_mv	AT adamnovozamsky extendedimd2020alargescaleannotateddatasettailoredfordetectingmanipulatedimages AT babakmahdian extendedimd2020alargescaleannotateddatasettailoredfordetectingmanipulatedimages AT stanislavsaic extendedimd2020alargescaleannotateddatasettailoredfordetectingmanipulatedimages

Extended IMD2020: a large‐scale annotated dataset tailored for detecting manipulated images

Similar Items