A joint-training topic model for social media texts
Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer Nature
2025-03-01
|
| Series: | Humanities & Social Sciences Communications |
| Online Access: | https://doi.org/10.1057/s41599-025-04551-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract The burgeoning significance of topic mining for social media text has intensified with the proliferation of social media platforms. Nevertheless, the brevity and discreteness of social media text pose significant challenges to conventional topic models, which often struggle to perform well on them. To address this, the paper establishes a more precise Position-Sensitive Word-Embedding Topic Model (PS-WETM) to adeptly capture intricate semantic and lexical relations within social media text. The model enriches the corpus and semantic relations based on word vector similarity, thereby yielding dense word vector representations. Furthermore, it proposes a position-sensitive word vector training model. The model meticulously distinguishes relations between the pivot word and context words positioned differently by assigning different weight matrices to context words in asymmetrical positions. Additionally, the model incorporates self-attention mechanism to globally capture dependencies between each element in the input word vectors, and calculates the contribution of each word to the topic matching performance. The experiment result highlights that the customized topic model outperforms existing short-text topic models, such as PTM, SPTM, DMM, GPU-DMM, GLTM and WETM. Hence, PS-WETM adeptly identifies diverse topics in social media text, demonstrating its outstanding performance in handling short texts with sparse words and discrete semantic relations. |
|---|---|
| ISSN: | 2662-9992 |