When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness

Images contain more visual semantic information. Consumers first view multimodal online reviews with images. Research on the helpfulness of reviews on e-commerce platforms mainly focuses on text, lacking insights into the product attributes reflected by review images and the relationship between ima...

Full description

Saved in:
Bibliographic Details
Main Authors: Chao Ma, Chen Yang, Ying Yu
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Journal of Theoretical and Applied Electronic Commerce Research
Subjects:
Online Access:https://www.mdpi.com/0718-1876/20/2/144
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432755969982464
author Chao Ma
Chen Yang
Ying Yu
author_facet Chao Ma
Chen Yang
Ying Yu
author_sort Chao Ma
collection DOAJ
description Images contain more visual semantic information. Consumers first view multimodal online reviews with images. Research on the helpfulness of reviews on e-commerce platforms mainly focuses on text, lacking insights into the product attributes reflected by review images and the relationship between images and text. Studying the relationship between images and text in online reviews can better explain consumer behavior and help consumers make purchasing decisions. Taking multimodal online review data from shopping platforms as the research object, this study proposes a research framework based on the Cognitive Theory of Multimedia Learning (CTML). It utilizes multiple pre-trained models, such as BLIP2 and machine learning methods, to construct metrics. A fuzzy-set qualitative comparative analysis (fsQCA) is conducted to explore the configurational effects of antecedent variables of multimodal online reviews on review helpfulness. The study identifies five configurational paths that lead to high review helpfulness. Specific review cases are used to examine the contribution paths of these configurations to perceived helpfulness, providing a new perspective for future research on multimodal online reviews. Targeted recommendations are made for operators and merchants based on the research findings, offering theoretical support for platforms to fully leverage the potential value of user-generated content.
format Article
id doaj-art-e0202d51637f463084e8e8e26cf8d2a4
institution Kabale University
issn 0718-1876
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Journal of Theoretical and Applied Electronic Commerce Research
spelling doaj-art-e0202d51637f463084e8e8e26cf8d2a42025-08-20T03:27:17ZengMDPI AGJournal of Theoretical and Applied Electronic Commerce Research0718-18762025-06-0120214410.3390/jtaer20020144When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review HelpfulnessChao Ma0Chen Yang1Ying Yu2College of Economics and Management, Zhejiang Normal University, Jinhua 321000, ChinaCollege of Economics and Management, Zhejiang Normal University, Jinhua 321000, ChinaCollege of Economics and Management, Zhejiang Normal University, Jinhua 321000, ChinaImages contain more visual semantic information. Consumers first view multimodal online reviews with images. Research on the helpfulness of reviews on e-commerce platforms mainly focuses on text, lacking insights into the product attributes reflected by review images and the relationship between images and text. Studying the relationship between images and text in online reviews can better explain consumer behavior and help consumers make purchasing decisions. Taking multimodal online review data from shopping platforms as the research object, this study proposes a research framework based on the Cognitive Theory of Multimedia Learning (CTML). It utilizes multiple pre-trained models, such as BLIP2 and machine learning methods, to construct metrics. A fuzzy-set qualitative comparative analysis (fsQCA) is conducted to explore the configurational effects of antecedent variables of multimodal online reviews on review helpfulness. The study identifies five configurational paths that lead to high review helpfulness. Specific review cases are used to examine the contribution paths of these configurations to perceived helpfulness, providing a new perspective for future research on multimodal online reviews. Targeted recommendations are made for operators and merchants based on the research findings, offering theoretical support for platforms to fully leverage the potential value of user-generated content.https://www.mdpi.com/0718-1876/20/2/144multimodal online reviewsmultimedia learning cognitive theoryperceived helpfulnesstext–image integrationfuzzy-set QCA
spellingShingle Chao Ma
Chen Yang
Ying Yu
When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
Journal of Theoretical and Applied Electronic Commerce Research
multimodal online reviews
multimedia learning cognitive theory
perceived helpfulness
text–image integration
fuzzy-set QCA
title When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
title_full When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
title_fullStr When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
title_full_unstemmed When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
title_short When Pixels Speak Louder: Unravelling the Synergy of Text–Image Integration in Multimodal Review Helpfulness
title_sort when pixels speak louder unravelling the synergy of text image integration in multimodal review helpfulness
topic multimodal online reviews
multimedia learning cognitive theory
perceived helpfulness
text–image integration
fuzzy-set QCA
url https://www.mdpi.com/0718-1876/20/2/144
work_keys_str_mv AT chaoma whenpixelsspeaklouderunravellingthesynergyoftextimageintegrationinmultimodalreviewhelpfulness
AT chenyang whenpixelsspeaklouderunravellingthesynergyoftextimageintegrationinmultimodalreviewhelpfulness
AT yingyu whenpixelsspeaklouderunravellingthesynergyoftextimageintegrationinmultimodalreviewhelpfulness