Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/9/5102 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper proposes a Word2Vec-based enhanced word embedding method and exhibits the design of a dual-channel hybrid neural network architecture to effectively extract semantic features. Specifically, we introduce a novel weighting scheme, Term Frequency-Document Frequency Category-Distribution Weight (TF-IDF-CDW), where Category Distribution Weight (CDW) reflects the distribution pattern of words across different categories. By weighting the pretrained Word2Vec vectors with TF-IDF-CDW and concatenating them with part-of-speech (POS) feature vectors, semantically enriched and more discriminative word embedding vectors are generated. Furthermore, we propose a dual-channel hybrid model based on a Gated Convolutional Neural Network (GCNN) and Bidirectional Long Short-Term Memory (BiLSTM), which jointly captures local features and long-range global dependencies. To evaluate the overall performance of the model, experiments were conducted on the Chinese short text datasets THUCNews and TNews. The proposed model achieved classification accuracies of 91.85% and 87.70%, respectively, outperforming several comparative models and demonstrating the effectiveness of the proposed method. |
|---|---|
| ISSN: | 2076-3417 |