Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square
The selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection technique...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | Indonesian |
| Published: |
Islamic University of Indragiri
2025-05-01
|
| Series: | Sistemasi: Jurnal Sistem Informasi |
| Subjects: | |
| Online Access: | https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849222085008687104 |
|---|---|
| author | Anelta Tirta Putri Subandono Dhani Ariatmanto |
| author_facet | Anelta Tirta Putri Subandono Dhani Ariatmanto |
| author_sort | Anelta Tirta Putri Subandono |
| collection | DOAJ |
| description | The selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection techniques have been employed in sentiment analysis, systematic studies comparing the effectiveness of Information Gain and Chi-Square in enhancing classification performance remain limited. This study aims to evaluate and optimize the impact of different feature selection methods on the performance of Support Vector Machine (SVM) and Random Forest (RF) in sentiment analysis. Experiments were conducted using eight testing schemes, including models without feature selection, with Information Gain, Chi-Square, and a combination of both. The results showed that SVM with Chi-Square achieved the highest accuracy at 93%, while Random Forest with Chi-Square achieved the best performance at 91%. These findings indicate that Chi-Square is more effective than Information Gain in improving accuracy, and that SVM outperforms Random Forest in text classification tasks. In conclusion, selecting the appropriate feature selection method significantly contributes to enhancing the accuracy of text classification models. This research can serve as a reference for optimizing feature selection techniques in the development of more accurate and efficient machine learning-based systems. |
| format | Article |
| id | doaj-art-ea3b3e517121408c8baa79b7cbcafd2b |
| institution | Kabale University |
| issn | 2302-8149 2540-9719 |
| language | Indonesian |
| publishDate | 2025-05-01 |
| publisher | Islamic University of Indragiri |
| record_format | Article |
| series | Sistemasi: Jurnal Sistem Informasi |
| spelling | doaj-art-ea3b3e517121408c8baa79b7cbcafd2b2025-08-26T08:05:46ZindIslamic University of IndragiriSistemasi: Jurnal Sistem Informasi2302-81492540-97192025-05-011431205121910.32520/stmsi.v14i3.51061060Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-SquareAnelta Tirta Putri Subandono0Dhani Ariatmanto1Universitas AMIKOM YogyakartaUniveristas AMIKOM YogyakartaThe selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection techniques have been employed in sentiment analysis, systematic studies comparing the effectiveness of Information Gain and Chi-Square in enhancing classification performance remain limited. This study aims to evaluate and optimize the impact of different feature selection methods on the performance of Support Vector Machine (SVM) and Random Forest (RF) in sentiment analysis. Experiments were conducted using eight testing schemes, including models without feature selection, with Information Gain, Chi-Square, and a combination of both. The results showed that SVM with Chi-Square achieved the highest accuracy at 93%, while Random Forest with Chi-Square achieved the best performance at 91%. These findings indicate that Chi-Square is more effective than Information Gain in improving accuracy, and that SVM outperforms Random Forest in text classification tasks. In conclusion, selecting the appropriate feature selection method significantly contributes to enhancing the accuracy of text classification models. This research can serve as a reference for optimizing feature selection techniques in the development of more accurate and efficient machine learning-based systems.https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106analisis sentimensupport vector machine (svm)random forest (rf)chi-squareinformation gain |
| spellingShingle | Anelta Tirta Putri Subandono Dhani Ariatmanto Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square Sistemasi: Jurnal Sistem Informasi analisis sentimen support vector machine (svm) random forest (rf) chi-square information gain |
| title | Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square |
| title_full | Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square |
| title_fullStr | Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square |
| title_full_unstemmed | Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square |
| title_short | Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square |
| title_sort | optimizing feature selection in sentiment analysis of bank saqu a comparative study of svm and random forest using information gain and chi square |
| topic | analisis sentimen support vector machine (svm) random forest (rf) chi-square information gain |
| url | https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/5106 |
| work_keys_str_mv | AT aneltatirtaputrisubandono optimizingfeatureselectioninsentimentanalysisofbanksaquacomparativestudyofsvmandrandomforestusinginformationgainandchisquare AT dhaniariatmanto optimizingfeatureselectioninsentimentanalysisofbanksaquacomparativestudyofsvmandrandomforestusinginformationgainandchisquare |