A joint-training topic model for social media texts

Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well...

Full description

Saved in:
Bibliographic Details
Main Authors: Simeng Qin, Mingli Zhang, Haiju Hu, Gang Li
Format: Article
Language:English
Published: Springer Nature 2025-03-01
Series:Humanities & Social Sciences Communications
Online Access:https://doi.org/10.1057/s41599-025-04551-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766792539406336
author Simeng Qin
Mingli Zhang
Haiju Hu
Gang Li
author_facet Simeng Qin
Mingli Zhang
Haiju Hu
Gang Li
author_sort Simeng Qin
collection DOAJ
description Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well on them. To address this, the paper establishes a more precise Position-Sensitive Word-Embedding Topic Model (PS-WETM) to adeptly capture intricate semantic and lexical relations within social media text. The model enriches the corpus and semantic relations based on word vector similarity, thereby yielding dense word vector representations. Furthermore, it proposes a position-sensitive word vector training model. The model meticulously distinguishes relations between the pivot word and context words positioned differently by assigning different weight matrices to context words in asymmetrical positions. Additionally, the model incorporates self-attention mechanism to globally capture dependencies between each element in the input word vectors, and calculates the contribution of each word to the topic matching performance. The experiment result highlights that the customized topic model outperforms existing short-text topic models, such as PTM, SPTM, DMM, GPU-DMM, GLTM and WETM. Hence, PS-WETM adeptly identifies diverse topics in social media text, demonstrating its outstanding performance in handling short texts with sparse words and discrete semantic relations.
format Article
id doaj-art-06ee4fdf41dc4861a112bd170bdcede8
institution DOAJ
issn 2662-9992
language English
publishDate 2025-03-01
publisher Springer Nature
record_format Article
series Humanities & Social Sciences Communications
spelling doaj-art-06ee4fdf41dc4861a112bd170bdcede82025-08-20T03:04:27ZengSpringer NatureHumanities & Social Sciences Communications2662-99922025-03-0112111610.1057/s41599-025-04551-2A joint-training topic model for social media textsSimeng Qin0Mingli Zhang1Haiju Hu2Gang Li3School of Management, Northeastern University at QinhuangdaoCollege of Economy and Management, Yanshan UniversityCollege of Economy and Management, Yanshan UniversitySchool of Management, Northeastern University at QinhuangdaoAbstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well on them. To address this, the paper establishes a more precise Position-Sensitive Word-Embedding Topic Model (PS-WETM) to adeptly capture intricate semantic and lexical relations within social media text. The model enriches the corpus and semantic relations based on word vector similarity, thereby yielding dense word vector representations. Furthermore, it proposes a position-sensitive word vector training model. The model meticulously distinguishes relations between the pivot word and context words positioned differently by assigning different weight matrices to context words in asymmetrical positions. Additionally, the model incorporates self-attention mechanism to globally capture dependencies between each element in the input word vectors, and calculates the contribution of each word to the topic matching performance. The experiment result highlights that the customized topic model outperforms existing short-text topic models, such as PTM, SPTM, DMM, GPU-DMM, GLTM and WETM. Hence, PS-WETM adeptly identifies diverse topics in social media text, demonstrating its outstanding performance in handling short texts with sparse words and discrete semantic relations.https://doi.org/10.1057/s41599-025-04551-2
spellingShingle Simeng Qin
Mingli Zhang
Haiju Hu
Gang Li
A joint-training topic model for social media texts
Humanities & Social Sciences Communications
title A joint-training topic model for social media texts
title_full A joint-training topic model for social media texts
title_fullStr A joint-training topic model for social media texts
title_full_unstemmed A joint-training topic model for social media texts
title_short A joint-training topic model for social media texts
title_sort joint training topic model for social media texts
url https://doi.org/10.1057/s41599-025-04551-2
work_keys_str_mv AT simengqin ajointtrainingtopicmodelforsocialmediatexts
AT minglizhang ajointtrainingtopicmodelforsocialmediatexts
AT haijuhu ajointtrainingtopicmodelforsocialmediatexts
AT gangli ajointtrainingtopicmodelforsocialmediatexts
AT simengqin jointtrainingtopicmodelforsocialmediatexts
AT minglizhang jointtrainingtopicmodelforsocialmediatexts
AT haijuhu jointtrainingtopicmodelforsocialmediatexts
AT gangli jointtrainingtopicmodelforsocialmediatexts