Augmenting Multimodal Content Representation with Transformers for Misinformation Detection

Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms id...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jenq-Haur Wang, Mehdi Norouzi, Shu Ming Tsai
Format:	Article
Language:	English
Published:	MDPI AG 2024-10-01
Series:	Big Data and Cognitive Computing
Subjects:	misinformation detection transformers multimodal feature fusion embedding augmentation credibility assessment
Online Access:	https://www.mdpi.com/2504-2289/8/10/134
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850205673324806144
author	Jenq-Haur Wang Mehdi Norouzi Shu Ming Tsai
author_facet	Jenq-Haur Wang Mehdi Norouzi Shu Ming Tsai
author_sort	Jenq-Haur Wang
collection	DOAJ
description	Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text inconsistency. In this paper, we propose a novel approach to misinformation detection by multimodal feature fusion with transformers and credibility assessment with self-attention-based Bi-RNN networks. Firstly, captions are derived from images using an image captioning module to obtain their semantic descriptions. These are compared with surrounding text by fine-tuning transformers for consistency check in semantics. Then, to further aggregate sentiment features into text representation, we fine-tune a separate transformer for text sentiment classification, where the output is concatenated to augment text embeddings. Finally, Multi-Cell Bi-GRUs with self-attention are used to train the credibility assessment model for misinformation detection. From the experimental results on tweets, the best performance with an accuracy of 0.904 and an F1-score of 0.921 can be obtained when applying feature fusion of augmented embeddings with sentiment classification results. This shows the potential of the innovative way of applying transformers in our proposed approach to misinformation detection. Further investigation is needed to validate the performance on various types of multimodal discrepancies.
format	Article
id	doaj-art-e2980bd89015411e9b39a05b5a23bcb5
institution	OA Journals
issn	2504-2289
language	English
publishDate	2024-10-01
publisher	MDPI AG
record_format	Article
series	Big Data and Cognitive Computing
spelling	doaj-art-e2980bd89015411e9b39a05b5a23bcb52025-08-20T02:11:01ZengMDPI AGBig Data and Cognitive Computing2504-22892024-10-0181013410.3390/bdcc8100134Augmenting Multimodal Content Representation with Transformers for Misinformation DetectionJenq-Haur Wang0Mehdi Norouzi1Shu Ming Tsai2Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 106, TaiwanElectrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH 45221, USAInventory Department, Cheng Hsin General Hospital, Taipei 112, TaiwanInformation sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text inconsistency. In this paper, we propose a novel approach to misinformation detection by multimodal feature fusion with transformers and credibility assessment with self-attention-based Bi-RNN networks. Firstly, captions are derived from images using an image captioning module to obtain their semantic descriptions. These are compared with surrounding text by fine-tuning transformers for consistency check in semantics. Then, to further aggregate sentiment features into text representation, we fine-tune a separate transformer for text sentiment classification, where the output is concatenated to augment text embeddings. Finally, Multi-Cell Bi-GRUs with self-attention are used to train the credibility assessment model for misinformation detection. From the experimental results on tweets, the best performance with an accuracy of 0.904 and an F1-score of 0.921 can be obtained when applying feature fusion of augmented embeddings with sentiment classification results. This shows the potential of the innovative way of applying transformers in our proposed approach to misinformation detection. Further investigation is needed to validate the performance on various types of multimodal discrepancies.https://www.mdpi.com/2504-2289/8/10/134misinformation detectiontransformersmultimodal feature fusionembedding augmentationcredibility assessment
spellingShingle	Jenq-Haur Wang Mehdi Norouzi Shu Ming Tsai Augmenting Multimodal Content Representation with Transformers for Misinformation Detection Big Data and Cognitive Computing misinformation detection transformers multimodal feature fusion embedding augmentation credibility assessment
title	Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
title_full	Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
title_fullStr	Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
title_full_unstemmed	Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
title_short	Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
title_sort	augmenting multimodal content representation with transformers for misinformation detection
topic	misinformation detection transformers multimodal feature fusion embedding augmentation credibility assessment
url	https://www.mdpi.com/2504-2289/8/10/134
work_keys_str_mv	AT jenqhaurwang augmentingmultimodalcontentrepresentationwithtransformersformisinformationdetection AT mehdinorouzi augmentingmultimodalcontentrepresentationwithtransformersformisinformationdetection AT shumingtsai augmentingmultimodalcontentrepresentationwithtransformersformisinformationdetection

Augmenting Multimodal Content Representation with Transformers for Misinformation Detection

Similar Items