Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square

The selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection technique...

Full description

Saved in:
Bibliographic Details
Main Authors: Anelta Tirta Putri Subandono, Dhani Ariatmanto
Format: Article
Language:Indonesian
Published: Islamic University of Indragiri 2025-05-01
Series:Sistemasi: Jurnal Sistem Informasi
Subjects:
Online Access:https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849222085008687104
author Anelta Tirta Putri Subandono
Dhani Ariatmanto
author_facet Anelta Tirta Putri Subandono
Dhani Ariatmanto
author_sort Anelta Tirta Putri Subandono
collection DOAJ
description The selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection techniques have been employed in sentiment analysis, systematic studies comparing the effectiveness of Information Gain and Chi-Square in enhancing classification performance remain limited. This study aims to evaluate and optimize the impact of different feature selection methods on the performance of Support Vector Machine (SVM) and Random Forest (RF) in sentiment analysis. Experiments were conducted using eight testing schemes, including models without feature selection, with Information Gain, Chi-Square, and a combination of both. The results showed that SVM with Chi-Square achieved the highest accuracy at 93%, while Random Forest with Chi-Square achieved the best performance at 91%. These findings indicate that Chi-Square is more effective than Information Gain in improving accuracy, and that SVM outperforms Random Forest in text classification tasks. In conclusion, selecting the appropriate feature selection method significantly contributes to enhancing the accuracy of text classification models. This research can serve as a reference for optimizing feature selection techniques in the development of more accurate and efficient machine learning-based systems.
format Article
id doaj-art-ea3b3e517121408c8baa79b7cbcafd2b
institution Kabale University
issn 2302-8149
2540-9719
language Indonesian
publishDate 2025-05-01
publisher Islamic University of Indragiri
record_format Article
series Sistemasi: Jurnal Sistem Informasi
spelling doaj-art-ea3b3e517121408c8baa79b7cbcafd2b2025-08-26T08:05:46ZindIslamic University of IndragiriSistemasi: Jurnal Sistem Informasi2302-81492540-97192025-05-011431205121910.32520/stmsi.v14i3.51061060Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-SquareAnelta Tirta Putri Subandono0Dhani Ariatmanto1Universitas AMIKOM YogyakartaUniveristas AMIKOM YogyakartaThe selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection techniques have been employed in sentiment analysis, systematic studies comparing the effectiveness of Information Gain and Chi-Square in enhancing classification performance remain limited. This study aims to evaluate and optimize the impact of different feature selection methods on the performance of Support Vector Machine (SVM) and Random Forest (RF) in sentiment analysis. Experiments were conducted using eight testing schemes, including models without feature selection, with Information Gain, Chi-Square, and a combination of both. The results showed that SVM with Chi-Square achieved the highest accuracy at 93%, while Random Forest with Chi-Square achieved the best performance at 91%. These findings indicate that Chi-Square is more effective than Information Gain in improving accuracy, and that SVM outperforms Random Forest in text classification tasks. In conclusion, selecting the appropriate feature selection method significantly contributes to enhancing the accuracy of text classification models. This research can serve as a reference for optimizing feature selection techniques in the development of more accurate and efficient machine learning-based systems.https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106analisis sentimensupport vector machine (svm)random forest (rf)chi-squareinformation gain
spellingShingle Anelta Tirta Putri Subandono
Dhani Ariatmanto
Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
Sistemasi: Jurnal Sistem Informasi
analisis sentimen
support vector machine (svm)
random forest (rf)
chi-square
information gain
title Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
title_full Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
title_fullStr Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
title_full_unstemmed Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
title_short Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
title_sort optimizing feature selection in sentiment analysis of bank saqu a comparative study of svm and random forest using information gain and chi square
topic analisis sentimen
support vector machine (svm)
random forest (rf)
chi-square
information gain
url https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106
work_keys_str_mv AT aneltatirtaputrisubandono optimizingfeatureselectioninsentimentanalysisofbanksaquacomparativestudyofsvmandrandomforestusinginformationgainandchisquare
AT dhaniariatmanto optimizingfeatureselectioninsentimentanalysisofbanksaquacomparativestudyofsvmandrandomforestusinginformationgainandchisquare