Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations

Generative AI and multimodal foundation models have fueled a proliferation of fake content on the Internet. This paper investigates if foundation models help detect and thereby contain the spread of fake images. The task of detecting fake images is a formidable challenge owing to its visual nature a...

Full description

Saved in:
Bibliographic Details
Main Authors: Vishnu S. Pendyala, Ashwin Chintalapati
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/16/12/432
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850050252073074688
author Vishnu S. Pendyala
Ashwin Chintalapati
author_facet Vishnu S. Pendyala
Ashwin Chintalapati
author_sort Vishnu S. Pendyala
collection DOAJ
description Generative AI and multimodal foundation models have fueled a proliferation of fake content on the Internet. This paper investigates if foundation models help detect and thereby contain the spread of fake images. The task of detecting fake images is a formidable challenge owing to its visual nature and intricate analysis. This paper details experiments using four multimodal foundation models, Llava, CLIP, Moondream2, and Gemini 1.5 Flash, to detect fake images. Explainable AI techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and removal-based explanations are used to gain insights into the detection process. The dataset used comprised real images and fake images generated by a generative artificial intelligence tool called MidJourney. Results show that the models can achieve up to a 69% accuracy rate in detecting fake images in an intuitively explainable way, as confirmed by multiple techniques and metrics.
format Article
id doaj-art-6f067ae72f65480eb292ccae52e55cc3
institution DOAJ
issn 1999-5903
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj-art-6f067ae72f65480eb292ccae52e55cc32025-08-20T02:53:30ZengMDPI AGFuture Internet1999-59032024-11-01161243210.3390/fi16120432Using Multimodal Foundation Models for Detecting Fake Images on the Internet with ExplanationsVishnu S. Pendyala0Ashwin Chintalapati1Department of Applied Data Science, San Jose State University, San Jose, CA 95192, USADepartment of Computer Science, Purdue University, West Lafayette, IN 47907, USAGenerative AI and multimodal foundation models have fueled a proliferation of fake content on the Internet. This paper investigates if foundation models help detect and thereby contain the spread of fake images. The task of detecting fake images is a formidable challenge owing to its visual nature and intricate analysis. This paper details experiments using four multimodal foundation models, Llava, CLIP, Moondream2, and Gemini 1.5 Flash, to detect fake images. Explainable AI techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and removal-based explanations are used to gain insights into the detection process. The dataset used comprised real images and fake images generated by a generative artificial intelligence tool called MidJourney. Results show that the models can achieve up to a 69% accuracy rate in detecting fake images in an intuitively explainable way, as confirmed by multiple techniques and metrics.https://www.mdpi.com/1999-5903/16/12/432misinformation containmentlarge multimodal modelsexplainable AIimage processing
spellingShingle Vishnu S. Pendyala
Ashwin Chintalapati
Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
Future Internet
misinformation containment
large multimodal models
explainable AI
image processing
title Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
title_full Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
title_fullStr Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
title_full_unstemmed Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
title_short Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
title_sort using multimodal foundation models for detecting fake images on the internet with explanations
topic misinformation containment
large multimodal models
explainable AI
image processing
url https://www.mdpi.com/1999-5903/16/12/432
work_keys_str_mv AT vishnuspendyala usingmultimodalfoundationmodelsfordetectingfakeimagesontheinternetwithexplanations
AT ashwinchintalapati usingmultimodalfoundationmodelsfordetectingfakeimagesontheinternetwithexplanations