The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods

In this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie revie...

Full description

Saved in:
Bibliographic Details
Main Authors: Veysel Göç, Muhammet Sinan Başarslan
Format: Article
Language:English
Published: Gazi University 2024-12-01
Series:Gazi Üniversitesi Fen Bilimleri Dergisi
Subjects:
Online Access:https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849731977778823168
author Veysel Göç
Muhammet Sinan Başarslan
author_facet Veysel Göç
Muhammet Sinan Başarslan
author_sort Veysel Göç
collection DOAJ
description In this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie reviews dataset, we first undertake data cleaning and text representation with data preprocessing steps. Then, we apply sentiment classification using different ML models. In order to evaluate the models, we used precision (P), recall (R), F1-score (F1), and area under curve (AUC), as well as receiver operating characteristic (ROC). It is worth noting that text feature extraction with Bidirectional Encoder Representations from Transformers (BERT) provided the highest performance in all models, with the SVM model offering particularly promising results. In this model, we observed the following results: ACC 0.9033, F1 0.9308, R 0.9015, R 0.9015, P 0.9072, AUC 0.9638, and ROC 0.96. These findings suggest that NLP techniques and, in particular, machine learning models that employ BERT may offer high levels of accuracy and reliability in text classification problems. It would be beneficial for future studies to validate these findings using BERT on different NLP tasks. This would help to evaluate the effectiveness and applicability of the models in practice.
format Article
id doaj-art-3da525cad1ea43e29121b24735fe5a9c
institution DOAJ
issn 2147-9526
language English
publishDate 2024-12-01
publisher Gazi University
record_format Article
series Gazi Üniversitesi Fen Bilimleri Dergisi
spelling doaj-art-3da525cad1ea43e29121b24735fe5a9c2025-08-20T03:08:21ZengGazi UniversityGazi Üniversitesi Fen Bilimleri Dergisi2147-95262024-12-0112489390110.29109/gujsc.1498509 The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning MethodsVeysel Göç0https://orcid.org/0009-0008-9598-2786Muhammet Sinan Başarslan 1https://orcid.org/0000-0002-7996-9169İSTANBUL MEDENİYET ÜNİVERSİTESİİSTANBUL MEDENİYET ÜNİVERSİTESİIn this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie reviews dataset, we first undertake data cleaning and text representation with data preprocessing steps. Then, we apply sentiment classification using different ML models. In order to evaluate the models, we used precision (P), recall (R), F1-score (F1), and area under curve (AUC), as well as receiver operating characteristic (ROC). It is worth noting that text feature extraction with Bidirectional Encoder Representations from Transformers (BERT) provided the highest performance in all models, with the SVM model offering particularly promising results. In this model, we observed the following results: ACC 0.9033, F1 0.9308, R 0.9015, R 0.9015, P 0.9072, AUC 0.9638, and ROC 0.96. These findings suggest that NLP techniques and, in particular, machine learning models that employ BERT may offer high levels of accuracy and reliability in text classification problems. It would be beneficial for future studies to validate these findings using BERT on different NLP tasks. This would help to evaluate the effectiveness and applicability of the models in practice.https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509machine learningmovie reviewsentiment analysistext representation.
spellingShingle Veysel Göç
Muhammet Sinan Başarslan
The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
Gazi Üniversitesi Fen Bilimleri Dergisi
machine learning
movie review
sentiment analysis
text representation.
title The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
title_full The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
title_fullStr The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
title_full_unstemmed The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
title_short The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
title_sort effect of various text representation methods for sentiment analysis on movie review data with different machine learning methods
topic machine learning
movie review
sentiment analysis
text representation.
url https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509
work_keys_str_mv AT veyselgoc theeffectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods
AT muhammetsinanbasarslan theeffectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods
AT veyselgoc effectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods
AT muhammetsinanbasarslan effectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods