Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics
Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new tex...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10980292/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850148896277266432 |
|---|---|
| author | Nikhil V. Chandran V. S. Anoop S. Asharaf |
| author_facet | Nikhil V. Chandran V. S. Anoop S. Asharaf |
| author_sort | Nikhil V. Chandran |
| collection | DOAJ |
| description | Traditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics. |
| format | Article |
| id | doaj-art-b5fa047ca02f49088769a061e1f4e272 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-b5fa047ca02f49088769a061e1f4e2722025-08-20T02:27:06ZengIEEEIEEE Access2169-35362025-01-0113779187793010.1109/ACCESS.2025.356581610980292Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text AnalyticsNikhil V. Chandran0https://orcid.org/0000-0002-3915-6358V. S. Anoop1S. Asharaf2Indian Institute of Information Technology and Management-Kerala, Thiruvananthapuram, IndiaThiagarajar School of Management (Autonomous) Madurai, Madurai, Tamil Nadu, IndiaInnovation and Technology, Kerala University of Digital Sciences, Thiruvananthapuram, IndiaTraditional text classification models, such as text kernels, primarily consider the syntactic aspects of text data. This paper introduces Topic-Weighted Kernels, a new text analytics framework that combines global topical themes with word-level semantics in a text kernel architecture. Three new text kernels are proposed to improve text analysis - (a) the Topic-Weighted Base Kernel, (b) the Topic-Weighted Word2Vec kernel, and (c) the Topic-Weighted BERT (Bidirectional Encoder Representations from Transformers) kernel. These kernels leverage topic modeling and deep word embeddings to capture thematic and semantic information within textual data. Text kernels consider global and local semantics for text analysis tasks and improve model performance. Experiments on diverse datasets demonstrate that Topic-Weighted Kernels outperforms existing methods for text analysis tasks. The Topic-Weighted BERT Kernel achieves top-tier performance, with F1 scores reaching 99% on lighter datasets and significantly boosting performance on more complex datasets. For the tasks of multi-label text classification on the Reuters-90 dataset and sentiment analysis on the IMDB dataset, the model achieves F1 scores of 90.76% and 96.66%, respectively, demonstrating state-of-the-art performance. The Topic-Weighted Kernel approach improves the performance while enabling a better contextual representation for various text analysis tasks such as single and multi-label classification and sentiment analysis. The proposed framework integrates semantics from word embeddings and topic models to text kernels for capturing intricate patterns in textual data that aid in more contextual text analytics.https://ieeexplore.ieee.org/document/10980292/Deep word embeddingslatent Dirichlet allocationtext kernelsBERTtopic modelingWord2Vec |
| spellingShingle | Nikhil V. Chandran V. S. Anoop S. Asharaf Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics IEEE Access Deep word embeddings latent Dirichlet allocation text kernels BERT topic modeling Word2Vec |
| title | Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics |
| title_full | Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics |
| title_fullStr | Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics |
| title_full_unstemmed | Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics |
| title_short | Topic-Weighted Kernels: Text Kernels Integrating Topic Weights and Deep Word Embeddings for Semantic Text Analytics |
| title_sort | topic weighted kernels text kernels integrating topic weights and deep word embeddings for semantic text analytics |
| topic | Deep word embeddings latent Dirichlet allocation text kernels BERT topic modeling Word2Vec |
| url | https://ieeexplore.ieee.org/document/10980292/ |
| work_keys_str_mv | AT nikhilvchandran topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics AT vsanoop topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics AT sasharaf topicweightedkernelstextkernelsintegratingtopicweightsanddeepwordembeddingsforsemantictextanalytics |