Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark

The current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including t...

Full description

Saved in:
Bibliographic Details
Main Authors: Marios C. Gkikas, Dimitris C. Gkikas, Gerasimos Vonitsanos, John A. Theodorou, Spyros Sioutas
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/22/10112
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850217252842897408
author Marios C. Gkikas
Dimitris C. Gkikas
Gerasimos Vonitsanos
John A. Theodorou
Spyros Sioutas
author_facet Marios C. Gkikas
Dimitris C. Gkikas
Gerasimos Vonitsanos
John A. Theodorou
Spyros Sioutas
author_sort Marios C. Gkikas
collection DOAJ
description The current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including the Mean Squared Error (MSE), R-squared (R<sup>2</sup>), Root Mean Squared Error (RMSE), and Concordance Index (C-index). The Random Forest model achieved the highest prediction accuracy among all machine learning models, followed by Linear Regression and the Decision Trees. The scatter plot for Linear Regression demonstrates good predictive accuracy for mid-range values. However, it shows significant deviations at the extremes, indicating that the model struggles to capture the full range of variability in the data. The bar chart of coefficients pinpoints the variables with the greatest impact on the predictions, providing suggestions for potential areas that can be improved and providing model interpretability. Future work could incorporate more predictive statistics models focusing on improving the models for extreme values by assessing non-linear models, feature engineering methods, and expanding research into less influential variables. The results greatly impact several sections, including aquaculture management, policy-making, and operational strategies, providing valuable insights for stakeholders and decision-makers. Apache Spark was used for data processing and machine learning model implementation; Apache Cassandra was also used for data storage, ensuring efficient large dataset management and SQL tools for structured data handling; Oracle VM VirtualBox for cross-platform virtualization; and Spark Connector was also used.
format Article
id doaj-art-aa7c265b7d6745d99d227bcb74a0cdd9
institution OA Journals
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-aa7c265b7d6745d99d227bcb74a0cdd92025-08-20T02:08:07ZengMDPI AGApplied Sciences2076-34172024-11-0114221011210.3390/app142210112Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache SparkMarios C. Gkikas0Dimitris C. Gkikas1Gerasimos Vonitsanos2John A. Theodorou3Spyros Sioutas4OWEB Digital Experience, 302 00 Mesolonghi, GreeceDepartment of Fisheries and Aquaculture, School of Agricultural Sciences, University of Patras, 302 00 Mesolonghi, GreeceDepartment of Computer Engineering and Informatics, School of Engineering, University of Patras, 265 04 Patras, GreeceDepartment of Fisheries and Aquaculture, School of Agricultural Sciences, University of Patras, 302 00 Mesolonghi, GreeceDepartment of Computer Engineering and Informatics, School of Engineering, University of Patras, 265 04 Patras, GreeceThe current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including the Mean Squared Error (MSE), R-squared (R<sup>2</sup>), Root Mean Squared Error (RMSE), and Concordance Index (C-index). The Random Forest model achieved the highest prediction accuracy among all machine learning models, followed by Linear Regression and the Decision Trees. The scatter plot for Linear Regression demonstrates good predictive accuracy for mid-range values. However, it shows significant deviations at the extremes, indicating that the model struggles to capture the full range of variability in the data. The bar chart of coefficients pinpoints the variables with the greatest impact on the predictions, providing suggestions for potential areas that can be improved and providing model interpretability. Future work could incorporate more predictive statistics models focusing on improving the models for extreme values by assessing non-linear models, feature engineering methods, and expanding research into less influential variables. The results greatly impact several sections, including aquaculture management, policy-making, and operational strategies, providing valuable insights for stakeholders and decision-makers. Apache Spark was used for data processing and machine learning model implementation; Apache Cassandra was also used for data storage, ensuring efficient large dataset management and SQL tools for structured data handling; Oracle VM VirtualBox for cross-platform virtualization; and Spark Connector was also used.https://www.mdpi.com/2076-3417/14/22/10112machine learningdata miningalgorithms assessmentdecision treesrandom forestlinear regression
spellingShingle Marios C. Gkikas
Dimitris C. Gkikas
Gerasimos Vonitsanos
John A. Theodorou
Spyros Sioutas
Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
Applied Sciences
machine learning
data mining
algorithms assessment
decision trees
random forest
linear regression
title Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
title_full Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
title_fullStr Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
title_full_unstemmed Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
title_short Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
title_sort application of machine learning for predictive analysis and management of mediterranean farmed fish mortalities a risk management case study using apache spark
topic machine learning
data mining
algorithms assessment
decision trees
random forest
linear regression
url https://www.mdpi.com/2076-3417/14/22/10112
work_keys_str_mv AT marioscgkikas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark
AT dimitriscgkikas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark
AT gerasimosvonitsanos applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark
AT johnatheodorou applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark
AT spyrossioutas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark