Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics

Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new tex...

Full description

Saved in:
Bibliographic Details
Main Authors: Nikhil V. Chandran, V. S. Anoop, S. Asharaf
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10980292/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850148896277266432
author Nikhil V. Chandran
V. S. Anoop
S. Asharaf
author_facet Nikhil V. Chandran
V. S. Anoop
S. Asharaf
author_sort Nikhil V. Chandran
collection DOAJ
description Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics.
format Article
id doaj-art-b5fa047ca02f49088769a061e1f4e272
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-b5fa047ca02f49088769a061e1f4e2722025-08-20T02:27:06ZengIEEEIEEE Access2169-35362025-01-0113779187793010.1109/ACCESS.2025.356581610980292Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text AnalyticsNikhil V. Chandran0https://orcid.org/0000-0002-3915-6358V. S. Anoop1S. Asharaf2Indian Institute of Information Technology and Management-Kerala, Thiruvananthapuram, IndiaThiagarajar School of Management (Autonomous) Madurai, Madurai, Tamil Nadu, IndiaInnovation and Technology, Kerala University of Digital Sciences, Thiruvananthapuram, IndiaTraditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics.https://ieeexplore.ieee.org/document/10980292/Deep word embeddingslatent Dirichlet allocationtext kernelsBERTtopic modelingWord2Vec
spellingShingle Nikhil V. Chandran
V. S. Anoop
S. Asharaf
Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
IEEE Access
Deep word embeddings
latent Dirichlet allocation
text kernels
BERT
topic modeling
Word2Vec
title Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
title_full Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
title_fullStr Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
title_full_unstemmed Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
title_short Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
title_sort topic weighted kernels text kernels integrating topic weights and deep word embeddings for semantic text analytics
topic Deep word embeddings
latent Dirichlet allocation
text kernels
BERT
topic modeling
Word2Vec
url https://ieeexplore.ieee.org/document/10980292/
work_keys_str_mv AT nikhilvchandran topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics
AT vsanoop topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics
AT sasharaf topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics