Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks

In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper...

Full description

Saved in:
Bibliographic Details
Main Authors: Cunhe Li, Zian Xie, Haotian Wang
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/9/5102
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849322448702406656
author Cunhe Li
Zian Xie
Haotian Wang
author_facet Cunhe Li
Zian Xie
Haotian Wang
author_sort Cunhe Li
collection DOAJ
description In recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper proposes a Word2Vec-based enhanced word embedding method and exhibits the design of a dual-channel hybrid neural network architecture to effectively extract semantic features. Specifically, we introduce a novel weighting scheme, Term Frequency-Document Frequency Category-Distribution Weight (TF-IDF-CDW), where Category Distribution Weight (CDW) reflects the distribution pattern of words across different categories. By weighting the pretrained Word2Vec vectors with TF-IDF-CDW and concatenating them with part-of-speech (POS) feature vectors, semantically enriched and more discriminative word embedding vectors are generated. Furthermore, we propose a dual-channel hybrid model based on a Gated Convolutional Neural Network (GCNN) and Bidirectional Long Short-Term Memory (BiLSTM), which jointly captures local features and long-range global dependencies. To evaluate the overall performance of the model, experiments were conducted on the Chinese short text datasets THUCNews and TNews. The proposed model achieved classification accuracies of 91.85% and 87.70%, respectively, outperforming several comparative models and demonstrating the effectiveness of the proposed method.
format Article
id doaj-art-574cd522a3504a718912e1d19684e7d5
institution Kabale University
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-574cd522a3504a718912e1d19684e7d52025-08-20T03:49:22ZengMDPI AGApplied Sciences2076-34172025-05-01159510210.3390/app15095102Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural NetworksCunhe Li0Zian Xie1Haotian Wang2Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaQingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaQingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, ChinaIn recent years, text classification has found wide application in diverse real-world scenarios. In Chinese news classification tasks, limitations such as sparse contextual information and semantic ambiguity exist in the title text. To improve the performance of short text classification, this paper proposes a Word2Vec-based enhanced word embedding method and exhibits the design of a dual-channel hybrid neural network architecture to effectively extract semantic features. Specifically, we introduce a novel weighting scheme, Term Frequency-Document Frequency Category-Distribution Weight (TF-IDF-CDW), where Category Distribution Weight (CDW) reflects the distribution pattern of words across different categories. By weighting the pretrained Word2Vec vectors with TF-IDF-CDW and concatenating them with part-of-speech (POS) feature vectors, semantically enriched and more discriminative word embedding vectors are generated. Furthermore, we propose a dual-channel hybrid model based on a Gated Convolutional Neural Network (GCNN) and Bidirectional Long Short-Term Memory (BiLSTM), which jointly captures local features and long-range global dependencies. To evaluate the overall performance of the model, experiments were conducted on the Chinese short text datasets THUCNews and TNews. The proposed model achieved classification accuracies of 91.85% and 87.70%, respectively, outperforming several comparative models and demonstrating the effectiveness of the proposed method.https://www.mdpi.com/2076-3417/15/9/5102text classificationword embeddinggated convolutional neural networkbidirectional long short-term memory
spellingShingle Cunhe Li
Zian Xie
Haotian Wang
Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
Applied Sciences
text classification
word embedding
gated convolutional neural network
bidirectional long short-term memory
title Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
title_full Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
title_fullStr Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
title_full_unstemmed Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
title_short Short Text Classification Based on Enhanced Word Embedding and Hybrid Neural Networks
title_sort short text classification based on enhanced word embedding and hybrid neural networks
topic text classification
word embedding
gated convolutional neural network
bidirectional long short-term memory
url https://www.mdpi.com/2076-3417/15/9/5102
work_keys_str_mv AT cunheli shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks
AT zianxie shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks
AT haotianwang shorttextclassificationbasedonenhancedwordembeddingandhybridneuralnetworks