Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features
Sarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusi...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5689 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850127572131643392 |
|---|---|
| author | Huixin Wu Yang Zang Limeng Zhao Hongyang Zhou |
| author_facet | Huixin Wu Yang Zang Limeng Zhao Hongyang Zhou |
| author_sort | Huixin Wu |
| collection | DOAJ |
| description | Sarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusion but often miss how sarcasm manifests differently in each modality. We propose a model that explicitly encodes audio signals into the textual representation space, allowing prosodic cues to inform language understanding. To extract relevant features at different levels, we use a multi-scale convolutional architecture. The experiments show consistent gains over prior models on both text and speech sarcasm detection tasks. |
| format | Article |
| id | doaj-art-dca2445eeb354460a6d09a430273a445 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-dca2445eeb354460a6d09a430273a4452025-08-20T02:33:38ZengMDPI AGApplied Sciences2076-34172025-05-011510568910.3390/app15105689Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual FeaturesHuixin Wu0Yang Zang1Limeng Zhao2Hongyang Zhou3School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSchool of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046, ChinaSarcasm often arises from subtle contrasts between literal meaning and speaker intention. As online communication increasingly includes voice-based content, detecting sarcasm across speech and text becomes more important—and more complex. The existing methods usually focus on generic multimodal fusion but often miss how sarcasm manifests differently in each modality. We propose a model that explicitly encodes audio signals into the textual representation space, allowing prosodic cues to inform language understanding. To extract relevant features at different levels, we use a multi-scale convolutional architecture. The experiments show consistent gains over prior models on both text and speech sarcasm detection tasks.https://www.mdpi.com/2076-3417/15/10/5689sarcasm detectionaudio attributesmultimodalmulti-scale convolutional network |
| spellingShingle | Huixin Wu Yang Zang Limeng Zhao Hongyang Zhou Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features Applied Sciences sarcasm detection audio attributes multimodal multi-scale convolutional network |
| title | Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features |
| title_full | Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features |
| title_fullStr | Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features |
| title_full_unstemmed | Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features |
| title_short | Multimodal Chinese Sarcasm Detection Integrating Audio Attributes and Textual Features |
| title_sort | multimodal chinese sarcasm detection integrating audio attributes and textual features |
| topic | sarcasm detection audio attributes multimodal multi-scale convolutional network |
| url | https://www.mdpi.com/2076-3417/15/10/5689 |
| work_keys_str_mv | AT huixinwu multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures AT yangzang multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures AT limengzhao multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures AT hongyangzhou multimodalchinesesarcasmdetectionintegratingaudioattributesandtextualfeatures |