Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews

Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in...

Full description

Saved in:
Bibliographic Details
Main Authors: B. Priya Kamath, M. Geetha, U. Dinesh Acharya, Dipesh Singh, Ayush Rao, Shwetha Rai, Roopashri Shetty
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858136/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in improving the performance of the SA tasks. Choosing an appropriate feature from the text is considered to be the most challenging task in text classification. This study examines the implementation of different traditional feature extraction models such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), N-grams, and word embeddings like Word2Vec and Bidirectional Encoder Representations from Transformers (BERT) on an Amazon review dataset. Furthermore, the hyperparameters of the BERT model were fine-tuned, optimizing its performance on the SA task. Additionally, this research explores the effectiveness of combining the word vectors generated using TF-IDF and BERT for SA. The proposed hybrid model implements an effective negation handling approach and combines TF-IDF with BERT to improve performance for Amazon product review classification.The hybrid model was evaluated using several performance metrics, including accuracy, recall, precision, and F1-score. The proposed hybrid model shows promising results achieving an accuracy of 88%. The integration of BERT and TF-IDF not only enhances the model’s ability to understand and interpret text but also demonstrates the potential of combining advanced and traditional NLP techniques.
ISSN:2169-3536