ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations

Abstract Driven by the rapid development of the internet and the era of data explosion, the efficiency of news dissemination has unprecedentedly improved, and the volume of text data has dramatically increased. Facing the public’s demand for “quick browsing,” Chinese news headlines, characterized by...

Full description

Saved in:

Bibliographic Details
Main Author:	Yumin Yan
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Subjects:	Extremely short chinese news headline classification Attention mechanism TextCNN ERNIE
Online Access:	https://doi.org/10.1038/s41598-025-14955-4
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849763681590575104
author	Yumin Yan
author_facet	Yumin Yan
author_sort	Yumin Yan
collection	DOAJ
description	Abstract Driven by the rapid development of the internet and the era of data explosion, the efficiency of news dissemination has unprecedentedly improved, and the volume of text data has dramatically increased. Facing the public’s demand for “quick browsing,” Chinese news headlines, characterized by their extremely short text, suffer from limited information, sparse features, and high ambiguity. To rapidly extract deep features from news headlines and enhance the classification performance of extremely short Chinese news headlines, we delve into the inherent characteristics of news headline data, focusing on multi-domain news classification problems and studying datasets of different scales. For the classification of large-scale extremely short Chinese news headline datasets, which are affected by feature sparsity and insufficient representation, we construct an improved convolutional classification model, ERNIE-AAFF-SECNN, based on an adaptive feature fusion mechanism. Firstly, the model employs an attention-based adaptive feature fusion module to dynamically learn and fuse character feature representations output from multiple layers of Transformer in ERNIE, enhancing the model’s deep understanding of the semantic relevance of “characters” in different headlines. Then, it combines BiLSTM networks to capture global feature information and introduces the SE attention mechanism to improve the TextCNN network, achieving weighted convolutional processing of BiLSTM hidden states to extract local feature information further precisely. Finally, it utilizes a double-layer fully connected layer to adjust the output dimensions to fit the classification task, introducing the ReLU activation function between the layers to enhance the model’s expressive power. For the classification of small-scale Chinese news headline datasets, which are limited by size and information scarcity, we construct a depthwise separable convolutional classification model, ERNIE-MSSE-DSCNN, based on a multi-scale SE attention mechanism. Firstly, the dataset is expanded using AEDA data augmentation technology based on word-level information. Then, depthwise separable convolution replaces the traditional convolutional layers in TextCNN, mitigating the risk of overfitting. The MSSE attention mechanism module is proposed to multi-scale integrate global feature information output from convolutional layers, dynamically weighting convolutional feature maps to further enhance the model’s ability to capture key information. Finally, the FGM strategy is introduced for adversarial training to enhance the model’s robustness and generalization. Experiments on two large datasets with different numbers of categories show significant improvements in accuracy compared to multiple models.
format	Article
id	doaj-art-899b07b33e63427eae27bfd13cd231a3
institution	DOAJ
issn	2045-2322
language	English
publishDate	2025-08-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-899b07b33e63427eae27bfd13cd231a32025-08-20T03:05:21ZengNature PortfolioScientific Reports2045-23222025-08-0115113110.1038/s41598-025-14955-4ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situationsYumin Yan0College of Literature and Journalism, Xiangtan UniversityAbstract Driven by the rapid development of the internet and the era of data explosion, the efficiency of news dissemination has unprecedentedly improved, and the volume of text data has dramatically increased. Facing the public’s demand for “quick browsing,” Chinese news headlines, characterized by their extremely short text, suffer from limited information, sparse features, and high ambiguity. To rapidly extract deep features from news headlines and enhance the classification performance of extremely short Chinese news headlines, we delve into the inherent characteristics of news headline data, focusing on multi-domain news classification problems and studying datasets of different scales. For the classification of large-scale extremely short Chinese news headline datasets, which are affected by feature sparsity and insufficient representation, we construct an improved convolutional classification model, ERNIE-AAFF-SECNN, based on an adaptive feature fusion mechanism. Firstly, the model employs an attention-based adaptive feature fusion module to dynamically learn and fuse character feature representations output from multiple layers of Transformer in ERNIE, enhancing the model’s deep understanding of the semantic relevance of “characters” in different headlines. Then, it combines BiLSTM networks to capture global feature information and introduces the SE attention mechanism to improve the TextCNN network, achieving weighted convolutional processing of BiLSTM hidden states to extract local feature information further precisely. Finally, it utilizes a double-layer fully connected layer to adjust the output dimensions to fit the classification task, introducing the ReLU activation function between the layers to enhance the model’s expressive power. For the classification of small-scale Chinese news headline datasets, which are limited by size and information scarcity, we construct a depthwise separable convolutional classification model, ERNIE-MSSE-DSCNN, based on a multi-scale SE attention mechanism. Firstly, the dataset is expanded using AEDA data augmentation technology based on word-level information. Then, depthwise separable convolution replaces the traditional convolutional layers in TextCNN, mitigating the risk of overfitting. The MSSE attention mechanism module is proposed to multi-scale integrate global feature information output from convolutional layers, dynamically weighting convolutional feature maps to further enhance the model’s ability to capture key information. Finally, the FGM strategy is introduced for adversarial training to enhance the model’s robustness and generalization. Experiments on two large datasets with different numbers of categories show significant improvements in accuracy compared to multiple models.https://doi.org/10.1038/s41598-025-14955-4Extremely short chinese news headline classificationAttention mechanismTextCNNERNIE
spellingShingle	Yumin Yan ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations Scientific Reports Extremely short chinese news headline classification Attention mechanism TextCNN ERNIE
title	ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations
title_full	ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations
title_fullStr	ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations
title_full_unstemmed	ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations
title_short	ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations
title_sort	ernie textcnn research on classification methods of chinese news headlines in different situations
topic	Extremely short chinese news headline classification Attention mechanism TextCNN ERNIE
url	https://doi.org/10.1038/s41598-025-14955-4
work_keys_str_mv	AT yuminyan ernietextcnnresearchonclassificationmethodsofchinesenewsheadlinesindifferentsituations

ERNIE-TextCNN: research on classification methods of Chinese news headlines in different situations

Similar Items