Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques

This study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discover...

Full description

Saved in:
Bibliographic Details
Main Authors: Dina Wulan Yekti rahayu, Khothibul Umam, Maya Rini Handayani
Format: Article
Language:English
Published: Politeknik Negeri Batam 2025-06-01
Series:Journal of Applied Informatics and Computing
Subjects:
Online Access:https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9584
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850080055243309056
author Dina Wulan Yekti rahayu
Khothibul Umam
Maya Rini Handayani
author_facet Dina Wulan Yekti rahayu
Khothibul Umam
Maya Rini Handayani
author_sort Dina Wulan Yekti rahayu
collection DOAJ
description This study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discovery in Databases (KDD) framework, which includes data selection, preprocessing, transformation, data mining, and evaluation. The evaluation uses metrics such as accuracy, precision, recall, F1-score, and macro average F1-score. Initial results show that all five algorithms performed fairly well even without using a balancing technique, with Naïve Bayes achieving the highest F1-score of 0.84 and recall of 0.81. After applying SMOTE, only small improvements were observed in some models, such as Random Forest (F1-score increased from 0.81 to 0.85), while other models like Naïve Bayes experienced a decrease in performance, dropping to 0.77. This suggests that the effect of balancing techniques like SMOTE can vary depending on the algorithm. Thus, this study provides empirical contributions that highlight the importance of selecting appropriate approaches and the need for a deep understanding of each algorithm's behavior in the context of imbalanced data. Researchers are encouraged to carefully consider these aspects when designing experiments and interpreting results.
format Article
id doaj-art-8e9b1879f44946e294cc9df755cb8cfa
institution DOAJ
issn 2548-6861
language English
publishDate 2025-06-01
publisher Politeknik Negeri Batam
record_format Article
series Journal of Applied Informatics and Computing
spelling doaj-art-8e9b1879f44946e294cc9df755cb8cfa2025-08-20T02:45:02ZengPoliteknik Negeri BatamJournal of Applied Informatics and Computing2548-68612025-06-0193998100510.30871/jaic.v9i3.95847129Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing TechniquesDina Wulan Yekti rahayuKhothibul UmamMaya Rini HandayaniThis study explores the performance of five sentiment classification algorithms—Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, and Random Forest—on an imbalanced sentiment dataset, with the SMOTE technique applied as a comparison. The research follows the Knowledge Discovery in Databases (KDD) framework, which includes data selection, preprocessing, transformation, data mining, and evaluation. The evaluation uses metrics such as accuracy, precision, recall, F1-score, and macro average F1-score. Initial results show that all five algorithms performed fairly well even without using a balancing technique, with Naïve Bayes achieving the highest F1-score of 0.84 and recall of 0.81. After applying SMOTE, only small improvements were observed in some models, such as Random Forest (F1-score increased from 0.81 to 0.85), while other models like Naïve Bayes experienced a decrease in performance, dropping to 0.77. This suggests that the effect of balancing techniques like SMOTE can vary depending on the algorithm. Thus, this study provides empirical contributions that highlight the importance of selecting appropriate approaches and the need for a deep understanding of each algorithm's behavior in the context of imbalanced data. Researchers are encouraged to carefully consider these aspects when designing experiments and interpreting results.https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9584data miningimbalanced datasetssentiment analysis
spellingShingle Dina Wulan Yekti rahayu
Khothibul Umam
Maya Rini Handayani
Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
Journal of Applied Informatics and Computing
data mining
imbalanced datasets
sentiment analysis
title Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
title_full Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
title_fullStr Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
title_full_unstemmed Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
title_short Performance of Machine Learning Algorithms on Imbalanced Sentiment Datasets Without Balancing Techniques
title_sort performance of machine learning algorithms on imbalanced sentiment datasets without balancing techniques
topic data mining
imbalanced datasets
sentiment analysis
url https://jurnal.polibatam.ac.id/index.php/JAIC/article/view/9584
work_keys_str_mv AT dinawulanyektirahayu performanceofmachinelearningalgorithmsonimbalancedsentimentdatasetswithoutbalancingtechniques
AT khothibulumam performanceofmachinelearningalgorithmsonimbalancedsentimentdatasetswithoutbalancingtechniques
AT mayarinihandayani performanceofmachinelearningalgorithmsonimbalancedsentimentdatasetswithoutbalancingtechniques