Comparison of Feature Extraction in Support Vector Machine (SVM) Based Sentiment Analysis System

Sentiment analysis plays a crucial role in natural language processing by identifying and categorizing opinions or emotions conveyed in textual data. It is widely applied across diverse fields such as product review analysis, social media monitoring, and market research. To enhance the accuracy and...

Full description

Saved in:
Bibliographic Details
Main Authors: Imam Fahrur Rozi, Irma Maulidia, Mamluatul Hani’ah, Rakhmat Arianto, Dika Rizky Yunianto, Ahmadi Yuli Ananta
Format: Article
Language:English
Published: Informatics Department, Engineering Faculty 2025-07-01
Series:Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi
Subjects:
Online Access:https://kursorjournal.org/index.php/kursor/article/view/417
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment analysis plays a crucial role in natural language processing by identifying and categorizing opinions or emotions conveyed in textual data. It is widely applied across diverse fields such as product review analysis, social media monitoring, and market research. To enhance the accuracy and reliability of sentiment classification, various methods and feature extraction techniques have been explored. This study investigates the use of Support Vector Machine (SVM) for sentiment analysis, comparing three feature extraction techniques: Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and Word2Vec. Our findings indicate that SVM performs effectively with all three feature extraction methods, with TF-IDF yielding the highest accuracy at 0.79. Although the BoW method showed competitive results, it slightly trailed TF-IDF in k-fold validation. Word2Vec, however, exhibited the lowest performance, achieving a maximum accuracy of 0.69. A comparative analysis of accuracy, precision, recall, and F1-score highlight the superiority of TF-IDF in delivering consistent and accurate results. Further statistical analysis using ANOVA revealed no significant differences between the models across any of the evaluation metrics. Additionally, the evaluation was conducted under several scenarios, including tests on balanced and imbalanced datasets, varying dataset sizes, and different CCC parameter values for SVM. These scenarios provided deeper insights into the factors influencing the system's performance, reinforcing that TF-IDF combined with SVM remains the most effective approach in this study.
ISSN:0216-0544
2301-6914