Advancing Sentiment Analysis for Low-Resource Languages Using Fine-Tuned LLMs: A Case Study of Customer Reviews in Turkish Language

This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements...

Full description

Saved in:
Bibliographic Details
Main Authors: Rukiye Savran Kiziltepe, Ercan Ezin, Omer Yentur, Arwa M. Basbrain, Murat Karakus
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10980352/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements Customer Review (VSCR), Turkish Sentiment Analysis Dataset (TSAD), and TR Customer Review (TRCR). While these datasets were initially labeled based on star ratings, we implemented a comprehensive relabeling process using state-of-the-art LLMs to enhance data quality. To ensure reliable annotations, we first conducted a comparative analysis of different LLMs using the Cohen’s Kappa agreement metric, which led to the selection of ChatGPT-4o-mini as the best-performing model for dataset annotation. Our methodology then focuses on evaluating the SA capabilities of leading instruction-tuned LLMs through a comparative analysis of zero-shot models and Low-Rank Adaptation (LoRA) fine-tuned LlaMA-3.2-1B-IT and Gemma-2-2B-IT models. Evaluations were conducted on both in-domain and out-domain test sets derived from the original star-ratings-based labels and the newly generated GPT labels. The results demonstrate that our fine-tuned models outperformed leading commercial LLMs by 6% in both in-domain and out-domain evaluations. Notably, models fine-tuned on GPT-generated labels achieved superior performance, with in-domain and out-domain F1-scores reaching 0.912 and 0.9184, respectively. These findings underscore the transformative potential of combining LLM relabeling with LoRA fine-tuning for optimizing SA, demonstrating robust performance across diverse datasets and domains.
ISSN:2169-3536