Wineinformatics: Wine Score Prediction with Wine Price and Reviews
Wineinformatics is a new field that applies data science to wine-related data. The goal of this paper is to determine whether incorporating wine price can improve the accuracy of score prediction. To explore the relationship between wine price and wine score, naive Bayes classifier and support vecto...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Fermentation |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2311-5637/10/12/598 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850042506530521088 |
|---|---|
| author | Yuka Nagayoshi Bernard Chen |
| author_facet | Yuka Nagayoshi Bernard Chen |
| author_sort | Yuka Nagayoshi |
| collection | DOAJ |
| description | Wineinformatics is a new field that applies data science to wine-related data. The goal of this paper is to determine whether incorporating wine price can improve the accuracy of score prediction. To explore the relationship between wine price and wine score, naive Bayes classifier and support vector machine (SVM) classifier are employed to predict the scores as either equal to or above 90 or below 90. The price values are normalized using four different methods: mean, median, boxplot mean, and boxplot median. To conduct a proper comparison, the original dataset from previous research, which includes a total of 14,349 wine reviews, was preprocessed by filtering all null price values, resulting in 9721 wine reviews. Using this dataset, classifiers, and normalization methods, the models with and without the price feature were compared. SVM classifier with mean normalization method (USD 50.04) achieved the best accuracy of 87.98%, while naive Bayes classifier with boxplot median normalization method (USD 28.00) showed the greatest improvement of 0.99%. From all the results, we concluded that boxplot median normalization (USD 28.00) is the most effective method in this study. These results indicate that incorporating price as an attribute enhances machine learning algorithms’ ability to recognize the correlation between wine reviews and scores. |
| format | Article |
| id | doaj-art-9acf515415334f8abb3bb3434ddf7447 |
| institution | DOAJ |
| issn | 2311-5637 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Fermentation |
| spelling | doaj-art-9acf515415334f8abb3bb3434ddf74472025-08-20T02:55:32ZengMDPI AGFermentation2311-56372024-11-01101259810.3390/fermentation10120598Wineinformatics: Wine Score Prediction with Wine Price and ReviewsYuka Nagayoshi0Bernard Chen1Department of Computer Science and Engineering, University of Central Arkansas, Conway, AR 72035, USADepartment of Computer Science and Engineering, University of Central Arkansas, Conway, AR 72035, USAWineinformatics is a new field that applies data science to wine-related data. The goal of this paper is to determine whether incorporating wine price can improve the accuracy of score prediction. To explore the relationship between wine price and wine score, naive Bayes classifier and support vector machine (SVM) classifier are employed to predict the scores as either equal to or above 90 or below 90. The price values are normalized using four different methods: mean, median, boxplot mean, and boxplot median. To conduct a proper comparison, the original dataset from previous research, which includes a total of 14,349 wine reviews, was preprocessed by filtering all null price values, resulting in 9721 wine reviews. Using this dataset, classifiers, and normalization methods, the models with and without the price feature were compared. SVM classifier with mean normalization method (USD 50.04) achieved the best accuracy of 87.98%, while naive Bayes classifier with boxplot median normalization method (USD 28.00) showed the greatest improvement of 0.99%. From all the results, we concluded that boxplot median normalization (USD 28.00) is the most effective method in this study. These results indicate that incorporating price as an attribute enhances machine learning algorithms’ ability to recognize the correlation between wine reviews and scores.https://www.mdpi.com/2311-5637/10/12/598wineinformaticswine pricewine reviewsnaïve BayesSVM |
| spellingShingle | Yuka Nagayoshi Bernard Chen Wineinformatics: Wine Score Prediction with Wine Price and Reviews Fermentation wineinformatics wine price wine reviews naïve Bayes SVM |
| title | Wineinformatics: Wine Score Prediction with Wine Price and Reviews |
| title_full | Wineinformatics: Wine Score Prediction with Wine Price and Reviews |
| title_fullStr | Wineinformatics: Wine Score Prediction with Wine Price and Reviews |
| title_full_unstemmed | Wineinformatics: Wine Score Prediction with Wine Price and Reviews |
| title_short | Wineinformatics: Wine Score Prediction with Wine Price and Reviews |
| title_sort | wineinformatics wine score prediction with wine price and reviews |
| topic | wineinformatics wine price wine reviews naïve Bayes SVM |
| url | https://www.mdpi.com/2311-5637/10/12/598 |
| work_keys_str_mv | AT yukanagayoshi wineinformaticswinescorepredictionwithwinepriceandreviews AT bernardchen wineinformaticswinescorepredictionwithwinepriceandreviews |