An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning

Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalit...

Full description

Saved in:
Bibliographic Details
Main Authors: Lianting Gong, Xingzhou He, Jianzhong Yang
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:Applied Artificial Intelligence
Online Access:https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850064160767868928
author Lianting Gong
Xingzhou He
Jianzhong Yang
author_facet Lianting Gong
Xingzhou He
Jianzhong Yang
author_sort Lianting Gong
collection DOAJ
description Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis.
format Article
id doaj-art-33f05e5f2e1b467eacfdd64a802c367c
institution DOAJ
issn 0883-9514
1087-6545
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series Applied Artificial Intelligence
spelling doaj-art-33f05e5f2e1b467eacfdd64a802c367c2025-08-20T02:49:22ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452024-12-0138110.1080/08839514.2024.2371712An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint LearningLianting Gong0Xingzhou He1Jianzhong Yang2School of Management and Information, Zhejiang College of Construction, Hangzhou City, Zhejiang Province, ChinaSchool of Humanities, Zhejiang University of Technology, Hangzhou City, Zhejiang Province, ChinaSchool of Humanities, Hangzhou Normal University, Hangzhou City, Zhejiang Province, ChinaMultimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis.https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712
spellingShingle Lianting Gong
Xingzhou He
Jianzhong Yang
An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
Applied Artificial Intelligence
title An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
title_full An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
title_fullStr An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
title_full_unstemmed An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
title_short An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
title_sort image text sentiment analysis method using multi channel multi modal joint learning
url https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712
work_keys_str_mv AT liantinggong animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning
AT xingzhouhe animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning
AT jianzhongyang animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning
AT liantinggong imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning
AT xingzhouhe imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning
AT jianzhongyang imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning