Comparative Analysis of TF-IDF and Word2Vec in Sentiment Analysis: A Case of Food Reviews
Sentiment analysis is an important area of natural language processing that supports applications such as market analysis, customer feedback, and social media monitoring by identifying and classifying opinions in text. Text representation is the basis of sentiment analysis, and TF-IDF and Word2Vec a...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
EDP Sciences
2025-01-01
|
Series: | ITM Web of Conferences |
Online Access: | https://www.itm-conferences.org/articles/itmconf/pdf/2025/01/itmconf_dai2024_02013.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Sentiment analysis is an important area of natural language processing that supports applications such as market analysis, customer feedback, and social media monitoring by identifying and classifying opinions in text. Text representation is the basis of sentiment analysis, and TF-IDF and Word2Vec are two commonly used methods to carry out text vectorization by counting word frequency and capturing semantic relations respectively. This paper compares the performance of TF-IDF and Word2Vec in sentiment analysis of food reviews to provide a more effective basis for enterprises and researchers to choose text representation techniques. Based on 560,000 food review data, this paper focuses on comparing the accuracy and generalization ability of the two methods under different dataset sizes. The results showed that TF-IDF showed high accuracy in training data (99.16%), but showed obvious overfitting problems in test data (73.9%). In contrast, Word2Vec was more balanced on training and testing data (68.4% vs. 68.65%), showing better generalization. This finding has guiding implications for choosing text representation methods, especially in sentiment analysis tasks on large data sets. |
---|---|
ISSN: | 2271-2097 |