Sentiment classification for telugu using transformed based approaches on a multi-domain dataset

Abstract Sentiment analysis is an essential component of Natural Language Processing (NLP) in resource-abundant languages such as English. Nevertheless, poor-resource languages such as Telugu have experienced limited efforts owing to multiple considerations, such as a scarcity of corpora for trainin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kannaiah Chattu, K. Adi Narayana Reddy, Sai babu veesam, Pardha Saradhi Chirumamilla, Vunnava Dinesh Babu, Krishna Prakash, Shonak Bansal, Mohammad Rashed Iqbal Faruque, K. S. Al-mugren
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Reports
Subjects:	Sentiment classification Natural Language processing Telugu Language Transformed based models XLM-RoBERTa
Online Access:	https://doi.org/10.1038/s41598-025-05703-9
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849335287586488320
author	Kannaiah Chattu K. Adi Narayana Reddy Sai babu veesam Pardha Saradhi Chirumamilla Vunnava Dinesh Babu Krishna Prakash Shonak Bansal Mohammad Rashed Iqbal Faruque K. S. Al-mugren
author_facet	Kannaiah Chattu K. Adi Narayana Reddy Sai babu veesam Pardha Saradhi Chirumamilla Vunnava Dinesh Babu Krishna Prakash Shonak Bansal Mohammad Rashed Iqbal Faruque K. S. Al-mugren
author_sort	Kannaiah Chattu
collection	DOAJ
description	Abstract Sentiment analysis is an essential component of Natural Language Processing (NLP) in resource-abundant languages such as English. Nevertheless, poor-resource languages such as Telugu have experienced limited efforts owing to multiple considerations, such as a scarcity of corpora for training machine learning models and an absence of gold standard datasets for evaluation. The current surge of transformed based models in NLP enables the attainment of exceptional performance in many different tasks. Nevertheless, researchers are increasingly interested in exploring the potential of transformed based models that have been pre-trained in several languages for various natural language processing applications, particularly for languages with limited resources. This research examines the efficacy of four pre-trained transformed based models, specifically IndicBERT, RoBERTa, DeBERTa, and XLM-RoBERTa, for sentence-level sentiment analysis in the Telugu language. Evaluated the performance of all four models using our dataset, “Sentikanna,” which consists of numerous domain datasets for the Telugu language. We compared the performance of these models with three different datasets and observed a promising outcome. XLM-RoBERTa achieves a good accuracy of 79.42% for a binary sentiment classification. This work can be considered a reliable standard for sentiment analysis in the Telugu language.
format	Article
id	doaj-art-97819503264640c58dc45e89dfd78aa9
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-97819503264640c58dc45e89dfd78aa92025-08-20T03:45:19ZengNature PortfolioScientific Reports2045-23222025-07-0115112110.1038/s41598-025-05703-9Sentiment classification for telugu using transformed based approaches on a multi-domain datasetKannaiah Chattu0K. Adi Narayana Reddy1Sai babu veesam2Pardha Saradhi Chirumamilla3Vunnava Dinesh Babu4Krishna Prakash5Shonak Bansal6Mohammad Rashed Iqbal Faruque7K. S. Al-mugren8Department of Computer Science & Engineering (AIML), Malla Reddy College of Engineering & TechnologyDepartment of Computer Science & Engineering, Faculty of Science and Technology (IcfaiTech), ICFAI Foundation for Higher Education (IFHE)School of Computer Science and Engineering, VIT-AP UniversitySenior Software Engineer, Unicon Systems IncDepartment of CSE, RV Institute of TechnologyDepartment of Electronics and Communication, NRI Institute of TechnologyDepartment of Electronics and Communication Engineering, Chandigarh UniversitySpace Science Centre (ANGKASA), Institute of Climate Change (IPI), Universiti Kebangsaan MalaysiaPhysics department, Science College, Princess Nourah bint Abdulrahman UniversityAbstract Sentiment analysis is an essential component of Natural Language Processing (NLP) in resource-abundant languages such as English. Nevertheless, poor-resource languages such as Telugu have experienced limited efforts owing to multiple considerations, such as a scarcity of corpora for training machine learning models and an absence of gold standard datasets for evaluation. The current surge of transformed based models in NLP enables the attainment of exceptional performance in many different tasks. Nevertheless, researchers are increasingly interested in exploring the potential of transformed based models that have been pre-trained in several languages for various natural language processing applications, particularly for languages with limited resources. This research examines the efficacy of four pre-trained transformed based models, specifically IndicBERT, RoBERTa, DeBERTa, and XLM-RoBERTa, for sentence-level sentiment analysis in the Telugu language. Evaluated the performance of all four models using our dataset, “Sentikanna,” which consists of numerous domain datasets for the Telugu language. We compared the performance of these models with three different datasets and observed a promising outcome. XLM-RoBERTa achieves a good accuracy of 79.42% for a binary sentiment classification. This work can be considered a reliable standard for sentiment analysis in the Telugu language.https://doi.org/10.1038/s41598-025-05703-9Sentiment classificationNatural Language processingTelugu LanguageTransformed based modelsXLM-RoBERTa
spellingShingle	Kannaiah Chattu K. Adi Narayana Reddy Sai babu veesam Pardha Saradhi Chirumamilla Vunnava Dinesh Babu Krishna Prakash Shonak Bansal Mohammad Rashed Iqbal Faruque K. S. Al-mugren Sentiment classification for telugu using transformed based approaches on a multi-domain dataset Scientific Reports Sentiment classification Natural Language processing Telugu Language Transformed based models XLM-RoBERTa
title	Sentiment classification for telugu using transformed based approaches on a multi-domain dataset
title_full	Sentiment classification for telugu using transformed based approaches on a multi-domain dataset
title_fullStr	Sentiment classification for telugu using transformed based approaches on a multi-domain dataset
title_full_unstemmed	Sentiment classification for telugu using transformed based approaches on a multi-domain dataset
title_short	Sentiment classification for telugu using transformed based approaches on a multi-domain dataset
title_sort	sentiment classification for telugu using transformed based approaches on a multi domain dataset
topic	Sentiment classification Natural Language processing Telugu Language Transformed based models XLM-RoBERTa
url	https://doi.org/10.1038/s41598-025-05703-9
work_keys_str_mv	AT kannaiahchattu sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT kadinarayanareddy sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT saibabuveesam sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT pardhasaradhichirumamilla sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT vunnavadineshbabu sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT krishnaprakash sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT shonakbansal sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT mohammadrashediqbalfaruque sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset AT ksalmugren sentimentclassificationforteluguusingtransformedbasedapproachesonamultidomaindataset

Sentiment classification for telugu using transformed based approaches on a multi-domain dataset

Similar Items