Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis

Extracting sentiments from the English-Telugu code-mixed data can be challenging and is still a relatively new research area. Data obtained from the Twitter API has to be in English-Telugu code-mixed language. That data is free-form text, noisy, lexicon borrowings, code-mixed, phonetic typing and mi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Arun Kodirekka, Ayyagari Srinagesh
Format:	Article
Language:	English
Published:	University of Tehran 2023-03-01
Series:	Journal of Information Technology Management
Subjects:	english-telugu code-mixed data natural language processing telugu senti wordnet machine learning deep learning
Online Access:	https://jitm.ut.ac.ir/article_91573_ed3783ba435e864a38d862a99ecfc33e.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849683700605779968
author	Arun Kodirekka Ayyagari Srinagesh
author_facet	Arun Kodirekka Ayyagari Srinagesh
author_sort	Arun Kodirekka
collection	DOAJ
description	Extracting sentiments from the English-Telugu code-mixed data can be challenging and is still a relatively new research area. Data obtained from the Twitter API has to be in English-Telugu code-mixed language. That data is free-form text, noisy, lexicon borrowings, code-mixed, phonetic typing and misspelling data. The initial step is language identification and sentiment class labels assigned to each tweet in the dataset. The second step is the data normalization task, and the final step is classification, which can be achieved using three different methods: lexicon, machine learning, and deep learning. In the lexicon-based approach, tokenize each tweet with its language tag. If the language tag is in Telugu, transliterate the roman script into native Telugu words. Words are verified with TeluguSentiWordNet, and the Telugu sentiments are extracted, and English SentiWordNets are used to extract sentiments from the English tokens. In this paper, the aspect-based sentiment analysis approach is suggested and used with normalized data. In addition, deep learning and machine learning techniques are applied to extract sentiment ratings, and the results are compared to prior work.
format	Article
id	doaj-art-cf21fcec12fc47c299be3adadce66f33
institution	DOAJ
issn	2008-5893 2423-5059
language	English
publishDate	2023-03-01
publisher	University of Tehran
record_format	Article
series	Journal of Information Technology Management
spelling	doaj-art-cf21fcec12fc47c299be3adadce66f332025-08-20T03:23:43ZengUniversity of TehranJournal of Information Technology Management2008-58932423-50592023-03-0115Special Issue: Digital Twin Enabled Neural Networks Architecture Management for Sustainable Computing15016310.22059/jitm.2023.9157391573Preprocessing of Aspect-based English Telugu Code Mixed Sentiment AnalysisArun Kodirekka0Ayyagari Srinagesh1Y.S.Rajasekhar Reddy University College of Engineering & Technology Acharya Nagarjuna University, Andhra Pradesh, India.Department of Computer Science and Engineering, RVR & JC College of Engineering, Andhra Pradesh, India.Extracting sentiments from the English-Telugu code-mixed data can be challenging and is still a relatively new research area. Data obtained from the Twitter API has to be in English-Telugu code-mixed language. That data is free-form text, noisy, lexicon borrowings, code-mixed, phonetic typing and misspelling data. The initial step is language identification and sentiment class labels assigned to each tweet in the dataset. The second step is the data normalization task, and the final step is classification, which can be achieved using three different methods: lexicon, machine learning, and deep learning. In the lexicon-based approach, tokenize each tweet with its language tag. If the language tag is in Telugu, transliterate the roman script into native Telugu words. Words are verified with TeluguSentiWordNet, and the Telugu sentiments are extracted, and English SentiWordNets are used to extract sentiments from the English tokens. In this paper, the aspect-based sentiment analysis approach is suggested and used with normalized data. In addition, deep learning and machine learning techniques are applied to extract sentiment ratings, and the results are compared to prior work.https://jitm.ut.ac.ir/article_91573_ed3783ba435e864a38d862a99ecfc33e.pdfenglish-telugu code-mixed datanatural language processingtelugu senti wordnetmachine learningdeep learning
spellingShingle	Arun Kodirekka Ayyagari Srinagesh Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis Journal of Information Technology Management english-telugu code-mixed data natural language processing telugu senti wordnet machine learning deep learning
title	Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis
title_full	Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis
title_fullStr	Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis
title_full_unstemmed	Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis
title_short	Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis
title_sort	preprocessing of aspect based english telugu code mixed sentiment analysis
topic	english-telugu code-mixed data natural language processing telugu senti wordnet machine learning deep learning
url	https://jitm.ut.ac.ir/article_91573_ed3783ba435e864a38d862a99ecfc33e.pdf
work_keys_str_mv	AT arunkodirekka preprocessingofaspectbasedenglishtelugucodemixedsentimentanalysis AT ayyagarisrinagesh preprocessingofaspectbasedenglishtelugucodemixedsentimentanalysis

Preprocessing of Aspect-based English Telugu Code Mixed Sentiment Analysis

Similar Items