Advancing Sentiment Analysis for Low-Resource Languages Using Fine-Tuned LLMs: A Case Study of Customer Reviews in Turkish Language
This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10980352/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements Customer Review (VSCR), Turkish Sentiment Analysis Dataset (TSAD), and TR Customer Review (TRCR). While these datasets were initially labeled based on star ratings, we implemented a comprehensive relabeling process using state-of-the-art LLMs to enhance data quality. To ensure reliable annotations, we first conducted a comparative analysis of different LLMs using the Cohen’s Kappa agreement metric, which led to the selection of ChatGPT-4o-mini as the best-performing model for dataset annotation. Our methodology then focuses on evaluating the SA capabilities of leading instruction-tuned LLMs through a comparative analysis of zero-shot models and Low-Rank Adaptation (LoRA) fine-tuned LlaMA-3.2-1B-IT and Gemma-2-2B-IT models. Evaluations were conducted on both in-domain and out-domain test sets derived from the original star-ratings-based labels and the newly generated GPT labels. The results demonstrate that our fine-tuned models outperformed leading commercial LLMs by 6% in both in-domain and out-domain evaluations. Notably, models fine-tuned on GPT-generated labels achieved superior performance, with in-domain and out-domain F1-scores reaching 0.912 and 0.9184, respectively. These findings underscore the transformative potential of combining LLM relabeling with LoRA fine-tuning for optimizing SA, demonstrating robust performance across diverse datasets and domains. |
|---|---|
| ISSN: | 2169-3536 |