A joint-training topic model for social media texts
Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer Nature
2025-03-01
|
| Series: | Humanities & Social Sciences Communications |
| Online Access: | https://doi.org/10.1057/s41599-025-04551-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849766792539406336 |
|---|---|
| author | Simeng Qin Mingli Zhang Haiju Hu Gang Li |
| author_facet | Simeng Qin Mingli Zhang Haiju Hu Gang Li |
| author_sort | Simeng Qin |
| collection | DOAJ |
| description | Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well on them. To address this, the paper establishes a more precise Position-Sensitive Word-Embedding Topic Model (PS-WETM) to adeptly capture intricate semantic and lexical relations within social media text. The model enriches the corpus and semantic relations based on word vector similarity, thereby yielding dense word vector representations. Furthermore, it proposes a position-sensitive word vector training model. The model meticulously distinguishes relations between the pivot word and context words positioned differently by assigning different weight matrices to context words in asymmetrical positions. Additionally, the model incorporates self-attention mechanism to globally capture dependencies between each element in the input word vectors, and calculates the contribution of each word to the topic matching performance. The experiment result highlights that the customized topic model outperforms existing short-text topic models, such as PTM, SPTM, DMM, GPU-DMM, GLTM and WETM. Hence, PS-WETM adeptly identifies diverse topics in social media text, demonstrating its outstanding performance in handling short texts with sparse words and discrete semantic relations. |
| format | Article |
| id | doaj-art-06ee4fdf41dc4861a112bd170bdcede8 |
| institution | DOAJ |
| issn | 2662-9992 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Springer Nature |
| record_format | Article |
| series | Humanities & Social Sciences Communications |
| spelling | doaj-art-06ee4fdf41dc4861a112bd170bdcede82025-08-20T03:04:27ZengSpringer NatureHumanities & Social Sciences Communications2662-99922025-03-0112111610.1057/s41599-025-04551-2A joint-training topic model for social media textsSimeng Qin0Mingli Zhang1Haiju Hu2Gang Li3School of Management, Northeastern University at QinhuangdaoCollege of Economy and Management, Yanshan UniversityCollege of Economy and Management, Yanshan UniversitySchool of Management, Northeastern University at QinhuangdaoAbstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well on them. To address this, the paper establishes a more precise Position-Sensitive Word-Embedding Topic Model (PS-WETM) to adeptly capture intricate semantic and lexical relations within social media text. The model enriches the corpus and semantic relations based on word vector similarity, thereby yielding dense word vector representations. Furthermore, it proposes a position-sensitive word vector training model. The model meticulously distinguishes relations between the pivot word and context words positioned differently by assigning different weight matrices to context words in asymmetrical positions. Additionally, the model incorporates self-attention mechanism to globally capture dependencies between each element in the input word vectors, and calculates the contribution of each word to the topic matching performance. The experiment result highlights that the customized topic model outperforms existing short-text topic models, such as PTM, SPTM, DMM, GPU-DMM, GLTM and WETM. Hence, PS-WETM adeptly identifies diverse topics in social media text, demonstrating its outstanding performance in handling short texts with sparse words and discrete semantic relations.https://doi.org/10.1057/s41599-025-04551-2 |
| spellingShingle | Simeng Qin Mingli Zhang Haiju Hu Gang Li A joint-training topic model for social media texts Humanities & Social Sciences Communications |
| title | A joint-training topic model for social media texts |
| title_full | A joint-training topic model for social media texts |
| title_fullStr | A joint-training topic model for social media texts |
| title_full_unstemmed | A joint-training topic model for social media texts |
| title_short | A joint-training topic model for social media texts |
| title_sort | joint training topic model for social media texts |
| url | https://doi.org/10.1057/s41599-025-04551-2 |
| work_keys_str_mv | AT simengqin ajointtrainingtopicmodelforsocialmediatexts AT minglizhang ajointtrainingtopicmodelforsocialmediatexts AT haijuhu ajointtrainingtopicmodelforsocialmediatexts AT gangli ajointtrainingtopicmodelforsocialmediatexts AT simengqin jointtrainingtopicmodelforsocialmediatexts AT minglizhang jointtrainingtopicmodelforsocialmediatexts AT haijuhu jointtrainingtopicmodelforsocialmediatexts AT gangli jointtrainingtopicmodelforsocialmediatexts |