Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/9/5102 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849322448702406656 |
|---|---|
| author | Cunhe Li Zian Xie Haotian Wang |
| author_facet | Cunhe Li Zian Xie Haotian Wang |
| author_sort | Cunhe Li |
| collection | DOAJ |
| description | In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper proposes a Word2Vec-based enhanced word embedding method and exhibits the design of a dual-channel hybrid neural network architecture to effectively extract semantic features. Specifically, we introduce a novel weighting scheme, Term Frequency-Document Frequency Category-Distribution Weight (TF-IDF-CDW), where Category Distribution Weight (CDW) reflects the distribution pattern of words across different categories. By weighting the pretrained Word2Vec vectors with TF-IDF-CDW and concatenating them with part-of-speech (POS) feature vectors, semantically enriched and more discriminative word embedding vectors are generated. Furthermore, we propose a dual-channel hybrid model based on a Gated Convolutional Neural Network (GCNN) and Bidirectional Long Short-Term Memory (BiLSTM), which jointly captures local features and long-range global dependencies. To evaluate the overall performance of the model, experiments were conducted on the Chinese short text datasets THUCNews and TNews. The proposed model achieved classification accuracies of 91.85% and 87.70%, respectively, outperforming several comparative models and demonstrating the effectiveness of the proposed method. |
| format | Article |
| id | doaj-art-574cd522a3504a718912e1d19684e7d5 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-574cd522a3504a718912e1d19684e7d52025-08-20T03:49:22ZengMDPI AGApplied Sciences2076-34172025-05-01159510210.3390/app15095102Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural NetworksCunhe Li0Zian Xie1Haotian Wang2Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaQingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaQingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaIn recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper proposes a Word2Vec-based enhanced word embedding method and exhibits the design of a dual-channel hybrid neural network architecture to effectively extract semantic features. Specifically, we introduce a novel weighting scheme, Term Frequency-Document Frequency Category-Distribution Weight (TF-IDF-CDW), where Category Distribution Weight (CDW) reflects the distribution pattern of words across different categories. By weighting the pretrained Word2Vec vectors with TF-IDF-CDW and concatenating them with part-of-speech (POS) feature vectors, semantically enriched and more discriminative word embedding vectors are generated. Furthermore, we propose a dual-channel hybrid model based on a Gated Convolutional Neural Network (GCNN) and Bidirectional Long Short-Term Memory (BiLSTM), which jointly captures local features and long-range global dependencies. To evaluate the overall performance of the model, experiments were conducted on the Chinese short text datasets THUCNews and TNews. The proposed model achieved classification accuracies of 91.85% and 87.70%, respectively, outperforming several comparative models and demonstrating the effectiveness of the proposed method.https://www.mdpi.com/2076-3417/15/9/5102text classificationword embeddinggated convolutional neural networkbidirectional long short-term memory |
| spellingShingle | Cunhe Li Zian Xie Haotian Wang Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks Applied Sciences text classification word embedding gated convolutional neural network bidirectional long short-term memory |
| title | Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks |
| title_full | Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks |
| title_fullStr | Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks |
| title_full_unstemmed | Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks |
| title_short | Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks |
| title_sort | short text classification based on enhanced word embedding and hybrid neural networks |
| topic | text classification word embedding gated convolutional neural network bidirectional long short-term memory |
| url | https://www.mdpi.com/2076-3417/15/9/5102 |
| work_keys_str_mv | AT cunheli shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks AT zianxie shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks AT haotianwang shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks |