Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages

The widespread use of online social media has enabled users to express their thoughts, feelings, opinions, and sentiments in their preferred languages. These diverse perspectives offer valuable insights for data-driven decision-making. While extensive sentiment analysis approaches have been develope...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Kashif Nazir, Cm Nadeem Faisal, Muhammad Asif Habib, Haseeb Ahmad
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Code-mixed dataset classification low resource languages mBERT sentiment analysis transformer
Online Access:	https://ieeexplore.ieee.org/document/10835765/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841536221466066944
author	Muhammad Kashif Nazir Cm Nadeem Faisal Muhammad Asif Habib Haseeb Ahmad
author_facet	Muhammad Kashif Nazir Cm Nadeem Faisal Muhammad Asif Habib Haseeb Ahmad
author_sort	Muhammad Kashif Nazir
collection	DOAJ
description	The widespread use of online social media has enabled users to express their thoughts, feelings, opinions, and sentiments in their preferred languages. These diverse perspectives offer valuable insights for data-driven decision-making. While extensive sentiment analysis approaches have been developed for resource-rich languages like English and Chinese, low-resource languages such as Roman Urdu and Roman Punjabi, especially in code-mixed contexts, have been largely neglected due to the lack of datasets and limited research on their unique morphological structures and grammatical complexities. This study aims to present a novel approach for multiclass sentiment analysis of low-resource, code-mixed datasets using multilingual transformers. Specifically, a dataset comprising Roman Urdu, Roman Punjabi, and English comments was collected. After applying traditional natural language preprocessing techniques, transformer-based libraries were used for tokenization and embedding. Subsequently, the Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model was optimized and trained for multiclass sentiment analysis on the code-mixed data. The evaluation results showed a significant improvement in accuracy (+22.55%), precision (+21.06%), recall (+22.55%), and F-measure (+25.50%) compared to benchmark algorithms. Additionally, the proposed model outperformed other transformer-based models, as well as deep learning and machine learning algorithms in sentiment extraction from code-mixed data. These findings highlight the potential of the proposed approach for sentiment analysis in low-resource, code-mixed languages.
format	Article
id	doaj-art-fbca1f06d25043338f651940cc50f4af
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-fbca1f06d25043338f651940cc50f4af2025-01-15T00:03:06ZengIEEEIEEE Access2169-35362025-01-01137538755410.1109/ACCESS.2025.352771010835765Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource LanguagesMuhammad Kashif Nazir0https://orcid.org/0000-0003-4094-4412Cm Nadeem Faisal1https://orcid.org/0000-0001-8781-4143Muhammad Asif Habib2Haseeb Ahmad3https://orcid.org/0000-0002-6359-7452Department of Computer Science, National Textile University, Faisalabad, PakistanDepartment of Computer Science, National Textile University, Faisalabad, PakistanCollege of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi ArabiaDepartment of Computer Science, National Textile University, Faisalabad, PakistanThe widespread use of online social media has enabled users to express their thoughts, feelings, opinions, and sentiments in their preferred languages. These diverse perspectives offer valuable insights for data-driven decision-making. While extensive sentiment analysis approaches have been developed for resource-rich languages like English and Chinese, low-resource languages such as Roman Urdu and Roman Punjabi, especially in code-mixed contexts, have been largely neglected due to the lack of datasets and limited research on their unique morphological structures and grammatical complexities. This study aims to present a novel approach for multiclass sentiment analysis of low-resource, code-mixed datasets using multilingual transformers. Specifically, a dataset comprising Roman Urdu, Roman Punjabi, and English comments was collected. After applying traditional natural language preprocessing techniques, transformer-based libraries were used for tokenization and embedding. Subsequently, the Multilingual Bidirectional Encoder Representations from Transformers (mBERT) model was optimized and trained for multiclass sentiment analysis on the code-mixed data. The evaluation results showed a significant improvement in accuracy (+22.55%), precision (+21.06%), recall (+22.55%), and F-measure (+25.50%) compared to benchmark algorithms. Additionally, the proposed model outperformed other transformer-based models, as well as deep learning and machine learning algorithms in sentiment extraction from code-mixed data. These findings highlight the potential of the proposed approach for sentiment analysis in low-resource, code-mixed languages.https://ieeexplore.ieee.org/document/10835765/Code-mixed datasetclassificationlow resource languagesmBERTsentiment analysistransformer
spellingShingle	Muhammad Kashif Nazir Cm Nadeem Faisal Muhammad Asif Habib Haseeb Ahmad Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages IEEE Access Code-mixed dataset classification low resource languages mBERT sentiment analysis transformer
title	Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
title_full	Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
title_fullStr	Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
title_full_unstemmed	Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
title_short	Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
title_sort	leveraging multilingual transformer for multiclass sentiment analysis in code mixed data of low resource languages
topic	Code-mixed dataset classification low resource languages mBERT sentiment analysis transformer
url	https://ieeexplore.ieee.org/document/10835765/
work_keys_str_mv	AT muhammadkashifnazir leveragingmultilingualtransformerformulticlasssentimentanalysisincodemixeddataoflowresourcelanguages AT cmnadeemfaisal leveragingmultilingualtransformerformulticlasssentimentanalysisincodemixeddataoflowresourcelanguages AT muhammadasifhabib leveragingmultilingualtransformerformulticlasssentimentanalysisincodemixeddataoflowresourcelanguages AT haseebahmad leveragingmultilingualtransformerformulticlasssentimentanalysisincodemixeddataoflowresourcelanguages

Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages

Similar Items