Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews
Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10858136/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823857152899416064 |
---|---|
author | B. Priya Kamath M. Geetha U. Dinesh Acharya Dipesh Singh Ayush Rao Shwetha Rai Roopashri Shetty |
author_facet | B. Priya Kamath M. Geetha U. Dinesh Acharya Dipesh Singh Ayush Rao Shwetha Rai Roopashri Shetty |
author_sort | B. Priya Kamath |
collection | DOAJ |
description | Sentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in improving the performance of the SA tasks. Choosing an appropriate feature from the text is considered to be the most challenging task in text classification. This study examines the implementation of different traditional feature extraction models such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), N-grams, and word embeddings like Word2Vec and Bidirectional Encoder Representations from Transformers (BERT) on an Amazon review dataset. Furthermore, the hyperparameters of the BERT model were fine-tuned, optimizing its performance on the SA task. Additionally, this research explores the effectiveness of combining the word vectors generated using TF-IDF and BERT for SA. The proposed hybrid model implements an effective negation handling approach and combines TF-IDF with BERT to improve performance for Amazon product review classification.The hybrid model was evaluated using several performance metrics, including accuracy, recall, precision, and F1-score. The proposed hybrid model shows promising results achieving an accuracy of 88%. The integration of BERT and TF-IDF not only enhances the model’s ability to understand and interpret text but also demonstrates the potential of combining advanced and traditional NLP techniques. |
format | Article |
id | doaj-art-aad84e935d224de1938067222099d1cf |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-aad84e935d224de1938067222099d1cf2025-02-12T00:02:48ZengIEEEIEEE Access2169-35362025-01-0113252392525510.1109/ACCESS.2025.353663110858136Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product ReviewsB. Priya Kamath0https://orcid.org/0000-0002-5471-8822M. Geetha1https://orcid.org/0000-0002-6150-7601U. Dinesh Acharya2https://orcid.org/0000-0002-0304-4725Dipesh Singh3https://orcid.org/0009-0002-1108-6199Ayush Rao4Shwetha Rai5https://orcid.org/0000-0002-5714-2611Roopashri Shetty6https://orcid.org/0000-0003-4145-620XDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaDepartment of Computer Science and Engineering, Manipal Academy of Higher Education, Manipal, Manipal Institute of Technology, Udupi, Karnataka, IndiaSentiment Analysis (SA) is a well-known and emerging research field in the area of Natural Language Processing (NLP) and text classification. Feature engineering is considered to be one of the major steps in the Machine Learning (ML) pipeline with effective feature extraction playing a vital role in improving the performance of the SA tasks. Choosing an appropriate feature from the text is considered to be the most challenging task in text classification. This study examines the implementation of different traditional feature extraction models such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), N-grams, and word embeddings like Word2Vec and Bidirectional Encoder Representations from Transformers (BERT) on an Amazon review dataset. Furthermore, the hyperparameters of the BERT model were fine-tuned, optimizing its performance on the SA task. Additionally, this research explores the effectiveness of combining the word vectors generated using TF-IDF and BERT for SA. The proposed hybrid model implements an effective negation handling approach and combines TF-IDF with BERT to improve performance for Amazon product review classification.The hybrid model was evaluated using several performance metrics, including accuracy, recall, precision, and F1-score. The proposed hybrid model shows promising results achieving an accuracy of 88%. The integration of BERT and TF-IDF not only enhances the model’s ability to understand and interpret text but also demonstrates the potential of combining advanced and traditional NLP techniques.https://ieeexplore.ieee.org/document/10858136/BERTclassificationfeature extractionmachine learningsentiment analysisWord embeddings |
spellingShingle | B. Priya Kamath M. Geetha U. Dinesh Acharya Dipesh Singh Ayush Rao Shwetha Rai Roopashri Shetty Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews IEEE Access BERT classification feature extraction machine learning sentiment analysis Word embeddings |
title | Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews |
title_full | Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews |
title_fullStr | Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews |
title_full_unstemmed | Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews |
title_short | Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews |
title_sort | comprehensive analysis of word embedding models and design of effective feature vector for classification of amazon product reviews |
topic | BERT classification feature extraction machine learning sentiment analysis Word embeddings |
url | https://ieeexplore.ieee.org/document/10858136/ |
work_keys_str_mv | AT bpriyakamath comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT mgeetha comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT udineshacharya comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT dipeshsingh comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT ayushrao comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT shwetharai comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews AT roopashrishetty comprehensiveanalysisofwordembeddingmodelsanddesignofeffectivefeaturevectorforclassificationofamazonproductreviews |