Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews

Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in...

Full description

Saved in:
Bibliographic Details
Main Authors: B. Priya Kamath, M. Geetha, U. Dinesh Acharya, Dipesh Singh, Ayush Rao, Shwetha Rai, Roopashri Shetty
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10858136/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823857152899416064
author B. Priya Kamath
M. Geetha
U. Dinesh Acharya
Dipesh Singh
Ayush Rao
Shwetha Rai
Roopashri Shetty
author_facet B. Priya Kamath
M. Geetha
U. Dinesh Acharya
Dipesh Singh
Ayush Rao
Shwetha Rai
Roopashri Shetty
author_sort B. Priya Kamath
collection DOAJ
description Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in improving the performance of the SA tasks. Choosing an appropriate feature from the text is considered to be the most challenging task in text classification. This study examines the implementation of different traditional feature extraction models such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), N-grams, and word embeddings like Word2Vec and Bidirectional Encoder Representations from Transformers (BERT) on an Amazon review dataset. Furthermore, the hyperparameters of the BERT model were fine-tuned, optimizing its performance on the SA task. Additionally, this research explores the effectiveness of combining the word vectors generated using TF-IDF and BERT for SA. The proposed hybrid model implements an effective negation handling approach and combines TF-IDF with BERT to improve performance for Amazon product review classification.The hybrid model was evaluated using several performance metrics, including accuracy, recall, precision, and F1-score. The proposed hybrid model shows promising results achieving an accuracy of 88%. The integration of BERT and TF-IDF not only enhances the model’s ability to understand and interpret text but also demonstrates the potential of combining advanced and traditional NLP techniques.
format Article
id doaj-art-aad84e935d224de1938067222099d1cf
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-aad84e935d224de1938067222099d1cf2025-02-12T00:02:48ZengIEEEIEEE Access2169-35362025-01-0113252392525510.1109/ACCESS.2025.353663110858136Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product ReviewsB. Priya Kamath0https://orcid.org/0000-0002-5471-8822M. Geetha1https://orcid.org/0000-0002-6150-7601U. Dinesh Acharya2https://orcid.org/0000-0002-0304-4725Dipesh Singh3https://orcid.org/0009-0002-1108-6199Ayush Rao4Shwetha Rai5https://orcid.org/0000-0002-5714-2611Roopashri Shetty6https://orcid.org/0000-0003-4145-620XDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaSentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in improving the performance of the SA tasks. Choosing an appropriate feature from the text is considered to be the most challenging task in text classification. This study examines the implementation of different traditional feature extraction models such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), N-grams, and word embeddings like Word2Vec and Bidirectional Encoder Representations from Transformers (BERT) on an Amazon review dataset. Furthermore, the hyperparameters of the BERT model were fine-tuned, optimizing its performance on the SA task. Additionally, this research explores the effectiveness of combining the word vectors generated using TF-IDF and BERT for SA. The proposed hybrid model implements an effective negation handling approach and combines TF-IDF with BERT to improve performance for Amazon product review classification.The hybrid model was evaluated using several performance metrics, including accuracy, recall, precision, and F1-score. The proposed hybrid model shows promising results achieving an accuracy of 88%. The integration of BERT and TF-IDF not only enhances the model’s ability to understand and interpret text but also demonstrates the potential of combining advanced and traditional NLP techniques.https://ieeexplore.ieee.org/document/10858136/BERTclassificationfeature extractionmachine learningsentiment analysisWord embeddings
spellingShingle B. Priya Kamath
M. Geetha
U. Dinesh Acharya
Dipesh Singh
Ayush Rao
Shwetha Rai
Roopashri Shetty
Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
IEEE Access
BERT
classification
feature extraction
machine learning
sentiment analysis
Word embeddings
title Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
title_full Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
title_fullStr Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
title_full_unstemmed Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
title_short Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
title_sort comprehensive analysis of word embedding models and design of effective feature vector for classification of amazon product reviews
topic BERT
classification
feature extraction
machine learning
sentiment analysis
Word embeddings
url https://ieeexplore.ieee.org/document/10858136/
work_keys_str_mv AT bpriyakamath comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT mgeetha comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT udineshacharya comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT dipeshsingh comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT ayushrao comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT shwetharai comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews
AT roopashrishetty comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews