An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning
Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalit...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2024-12-01
|
| Series: | Applied Artificial Intelligence |
| Online Access: | https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850064160767868928 |
|---|---|
| author | Lianting Gong Xingzhou He Jianzhong Yang |
| author_facet | Lianting Gong Xingzhou He Jianzhong Yang |
| author_sort | Lianting Gong |
| collection | DOAJ |
| description | Multimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis. |
| format | Article |
| id | doaj-art-33f05e5f2e1b467eacfdd64a802c367c |
| institution | DOAJ |
| issn | 0883-9514 1087-6545 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Applied Artificial Intelligence |
| spelling | doaj-art-33f05e5f2e1b467eacfdd64a802c367c2025-08-20T02:49:22ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452024-12-0138110.1080/08839514.2024.2371712An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint LearningLianting Gong0Xingzhou He1Jianzhong Yang2School of Management and Information, Zhejiang College of Construction, Hangzhou City, Zhejiang Province, ChinaSchool of Humanities, Zhejiang University of Technology, Hangzhou City, Zhejiang Province, ChinaSchool of Humanities, Hangzhou Normal University, Hangzhou City, Zhejiang Province, ChinaMultimodal sentiment analysis is a technical approach that integrates various modalities to analyze sentiment tendencies or emotional states. Existing challenges encountered by this approach include redundancy in independent modal features and a lack of correlation analysis between different modalities, causing insufficient fusion and degradation of result accuracy. To address these issues, this study proposes an innovative multi-channel multimodal joint learning method for image-text sentiment analysis. First, a multi-channel feature extraction module is introduced to comprehensively capture image or text features. Second, effective interaction of multimodal features is achieved by designing modality-wise interaction modules that eliminate redundant features through cross-modal cross-attention. Last, to consider the complementary role of contextual information in sentiment analysis, an adaptive multi-task fusion method is used to merge single-modal context features with multimodal features for enhancing the reliability of sentiment predictions. Experimental results demonstrate that the proposed method achieves an accuracy of 76.98% and 75.32% on the MVSA-Single and MVSA-Multiple datasets, with F1 scores of 76.23% and 75.29%, respectively, outperforming other state-of-the-art methods. This research provides new insights and methods for advancing multimodal feature fusion, enhancing the accuracy and practicality of sentiment analysis.https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712 |
| spellingShingle | Lianting Gong Xingzhou He Jianzhong Yang An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning Applied Artificial Intelligence |
| title | An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning |
| title_full | An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning |
| title_fullStr | An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning |
| title_full_unstemmed | An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning |
| title_short | An Image-Text Sentiment Analysis Method Using Multi-Channel Multi-Modal Joint Learning |
| title_sort | image text sentiment analysis method using multi channel multi modal joint learning |
| url | https://www.tandfonline.com/doi/10.1080/08839514.2024.2371712 |
| work_keys_str_mv | AT liantinggong animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning AT xingzhouhe animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning AT jianzhongyang animagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning AT liantinggong imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning AT xingzhouhe imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning AT jianzhongyang imagetextsentimentanalysismethodusingmultichannelmultimodaljointlearning |