Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features

Sarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusi...

Full description

Saved in:
Bibliographic Details
Main Authors: Huixin Wu, Yang Zang, Limeng Zhao, Hongyang Zhou
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5689
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850127572131643392
author Huixin Wu
Yang Zang
Limeng Zhao
Hongyang Zhou
author_facet Huixin Wu
Yang Zang
Limeng Zhao
Hongyang Zhou
author_sort Huixin Wu
collection DOAJ
description Sarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusion but often miss how sarcasm manifests differently in each modality. We propose a model that explicitly encodes audio signals into the textual representation space, allowing prosodic cues to inform language understanding. To extract relevant features at different levels, we use a multi-scale convolutional architecture. The experiments show consistent gains over prior models on both text and speech sarcasm detection tasks.
format Article
id doaj-art-dca2445eeb354460a6d09a430273a445
institution OA Journals
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-dca2445eeb354460a6d09a430273a4452025-08-20T02:33:38ZengMDPI AGApplied Sciences2076-34172025-05-011510568910.3390/app15105689Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual FeaturesHuixin Wu0Yang Zang1Limeng Zhao2Hongyang Zhou3School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusion but often miss how sarcasm manifests differently in each modality. We propose a model that explicitly encodes audio signals into the textual representation space, allowing prosodic cues to inform language understanding. To extract relevant features at different levels, we use a multi-scale convolutional architecture. The experiments show consistent gains over prior models on both text and speech sarcasm detection tasks.https://www.mdpi.com/2076-3417/15/10/5689sarcasm detectionaudio attributesmultimodalmulti-scale convolutional network
spellingShingle Huixin Wu
Yang Zang
Limeng Zhao
Hongyang Zhou
Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
Applied Sciences
sarcasm detection
audio attributes
multimodal
multi-scale convolutional network
title Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
title_full Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
title_fullStr Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
title_full_unstemmed Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
title_short Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
title_sort multimodal chinese sarcasm detection integrating audio attributes and textual features
topic sarcasm detection
audio attributes
multimodal
multi-scale convolutional network
url https://www.mdpi.com/2076-3417/15/10/5689
work_keys_str_mv AT huixinwu multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures
AT yangzang multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures
AT limengzhao multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures
AT hongyangzhou multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures