The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods
In this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie revie...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Gazi University
2024-12-01
|
| Series: | Gazi Üniversitesi Fen Bilimleri Dergisi |
| Subjects: | |
| Online Access: | https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849731977778823168 |
|---|---|
| author | Veysel Göç Muhammet Sinan Başarslan |
| author_facet | Veysel Göç Muhammet Sinan Başarslan |
| author_sort | Veysel Göç |
| collection | DOAJ |
| description | In this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie reviews dataset, we first undertake data cleaning and text representation with data preprocessing steps. Then, we apply sentiment classification using different ML models. In order to evaluate the models, we used precision (P), recall (R), F1-score (F1), and area under curve (AUC), as well as receiver operating characteristic (ROC). It is worth noting that text feature extraction with Bidirectional Encoder Representations from Transformers (BERT) provided the highest performance in all models, with the SVM model offering particularly promising results. In this model, we observed the following results: ACC 0.9033, F1 0.9308, R 0.9015, R 0.9015, P 0.9072, AUC 0.9638, and ROC 0.96. These findings suggest that NLP techniques and, in particular, machine learning models that employ BERT may offer high levels of accuracy and reliability in text classification problems. It would be beneficial for future studies to validate these findings using BERT on different NLP tasks. This would help to evaluate the effectiveness and applicability of the models in practice. |
| format | Article |
| id | doaj-art-3da525cad1ea43e29121b24735fe5a9c |
| institution | DOAJ |
| issn | 2147-9526 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Gazi University |
| record_format | Article |
| series | Gazi Üniversitesi Fen Bilimleri Dergisi |
| spelling | doaj-art-3da525cad1ea43e29121b24735fe5a9c2025-08-20T03:08:21ZengGazi UniversityGazi Üniversitesi Fen Bilimleri Dergisi2147-95262024-12-0112489390110.29109/gujsc.1498509 The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning MethodsVeysel Göç0https://orcid.org/0009-0008-9598-2786Muhammet Sinan Başarslan 1https://orcid.org/0000-0002-7996-9169İSTANBUL MEDENİYET ÜNİVERSİTESİİSTANBUL MEDENİYET ÜNİVERSİTESİIn this study, we explore the potential of machine learning (ML) models after different text representation methods on the balanced IMDB dataset, which is widely regarded as a gold standard in sentiment analysis, one of the Natural Language processing (NLP) tasks. On the open source IMDB movie reviews dataset, we first undertake data cleaning and text representation with data preprocessing steps. Then, we apply sentiment classification using different ML models. In order to evaluate the models, we used precision (P), recall (R), F1-score (F1), and area under curve (AUC), as well as receiver operating characteristic (ROC). It is worth noting that text feature extraction with Bidirectional Encoder Representations from Transformers (BERT) provided the highest performance in all models, with the SVM model offering particularly promising results. In this model, we observed the following results: ACC 0.9033, F1 0.9308, R 0.9015, R 0.9015, P 0.9072, AUC 0.9638, and ROC 0.96. These findings suggest that NLP techniques and, in particular, machine learning models that employ BERT may offer high levels of accuracy and reliability in text classification problems. It would be beneficial for future studies to validate these findings using BERT on different NLP tasks. This would help to evaluate the effectiveness and applicability of the models in practice.https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509machine learningmovie reviewsentiment analysistext representation. |
| spellingShingle | Veysel Göç Muhammet Sinan Başarslan The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods Gazi Üniversitesi Fen Bilimleri Dergisi machine learning movie review sentiment analysis text representation. |
| title | The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods |
| title_full | The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods |
| title_fullStr | The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods |
| title_full_unstemmed | The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods |
| title_short | The Effect of Various Text Representation Methods for Sentiment Analysis on Movie Review Data with Different Machine Learning Methods |
| title_sort | effect of various text representation methods for sentiment analysis on movie review data with different machine learning methods |
| topic | machine learning movie review sentiment analysis text representation. |
| url | https://dergipark.org.tr/tr/pub/gujsc/issue/89546/1498509 |
| work_keys_str_mv | AT veyselgoc theeffectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods AT muhammetsinanbasarslan theeffectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods AT veyselgoc effectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods AT muhammetsinanbasarslan effectofvarioustextrepresentationmethodsforsentimentanalysisonmoviereviewdatawithdifferentmachinelearningmethods |