Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark
The current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including t...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/22/10112 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850217252842897408 |
|---|---|
| author | Marios C. Gkikas Dimitris C. Gkikas Gerasimos Vonitsanos John A. Theodorou Spyros Sioutas |
| author_facet | Marios C. Gkikas Dimitris C. Gkikas Gerasimos Vonitsanos John A. Theodorou Spyros Sioutas |
| author_sort | Marios C. Gkikas |
| collection | DOAJ |
| description | The current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including the Mean Squared Error (MSE), R-squared (R<sup>2</sup>), Root Mean Squared Error (RMSE), and Concordance Index (C-index). The Random Forest model achieved the highest prediction accuracy among all machine learning models, followed by Linear Regression and the Decision Trees. The scatter plot for Linear Regression demonstrates good predictive accuracy for mid-range values. However, it shows significant deviations at the extremes, indicating that the model struggles to capture the full range of variability in the data. The bar chart of coefficients pinpoints the variables with the greatest impact on the predictions, providing suggestions for potential areas that can be improved and providing model interpretability. Future work could incorporate more predictive statistics models focusing on improving the models for extreme values by assessing non-linear models, feature engineering methods, and expanding research into less influential variables. The results greatly impact several sections, including aquaculture management, policy-making, and operational strategies, providing valuable insights for stakeholders and decision-makers. Apache Spark was used for data processing and machine learning model implementation; Apache Cassandra was also used for data storage, ensuring efficient large dataset management and SQL tools for structured data handling; Oracle VM VirtualBox for cross-platform virtualization; and Spark Connector was also used. |
| format | Article |
| id | doaj-art-aa7c265b7d6745d99d227bcb74a0cdd9 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-aa7c265b7d6745d99d227bcb74a0cdd92025-08-20T02:08:07ZengMDPI AGApplied Sciences2076-34172024-11-0114221011210.3390/app142210112Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache SparkMarios C. Gkikas0Dimitris C. Gkikas1Gerasimos Vonitsanos2John A. Theodorou3Spyros Sioutas4OWEB Digital Experience, 302 00 Mesolonghi, GreeceDepartment of Fisheries and Aquaculture, School of Agricultural Sciences, University of Patras, 302 00 Mesolonghi, GreeceDepartment of Computer Engineering and Informatics, School of Engineering, University of Patras, 265 04 Patras, GreeceDepartment of Fisheries and Aquaculture, School of Agricultural Sciences, University of Patras, 302 00 Mesolonghi, GreeceDepartment of Computer Engineering and Informatics, School of Engineering, University of Patras, 265 04 Patras, GreeceThe current study evaluates the performance of three machine learning models—Decision Trees, Random Forest, and Linear Regression—applied to aquaculture data to mitigate risks in aquaculture management. The performances of these models are analyzed and properly demonstrated using metrics including the Mean Squared Error (MSE), R-squared (R<sup>2</sup>), Root Mean Squared Error (RMSE), and Concordance Index (C-index). The Random Forest model achieved the highest prediction accuracy among all machine learning models, followed by Linear Regression and the Decision Trees. The scatter plot for Linear Regression demonstrates good predictive accuracy for mid-range values. However, it shows significant deviations at the extremes, indicating that the model struggles to capture the full range of variability in the data. The bar chart of coefficients pinpoints the variables with the greatest impact on the predictions, providing suggestions for potential areas that can be improved and providing model interpretability. Future work could incorporate more predictive statistics models focusing on improving the models for extreme values by assessing non-linear models, feature engineering methods, and expanding research into less influential variables. The results greatly impact several sections, including aquaculture management, policy-making, and operational strategies, providing valuable insights for stakeholders and decision-makers. Apache Spark was used for data processing and machine learning model implementation; Apache Cassandra was also used for data storage, ensuring efficient large dataset management and SQL tools for structured data handling; Oracle VM VirtualBox for cross-platform virtualization; and Spark Connector was also used.https://www.mdpi.com/2076-3417/14/22/10112machine learningdata miningalgorithms assessmentdecision treesrandom forestlinear regression |
| spellingShingle | Marios C. Gkikas Dimitris C. Gkikas Gerasimos Vonitsanos John A. Theodorou Spyros Sioutas Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark Applied Sciences machine learning data mining algorithms assessment decision trees random forest linear regression |
| title | Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark |
| title_full | Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark |
| title_fullStr | Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark |
| title_full_unstemmed | Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark |
| title_short | Application of Machine Learning for Predictive Analysis and Management of Mediterranean-Farmed Fish Mortalities: A Risk Management Case Study Using Apache Spark |
| title_sort | application of machine learning for predictive analysis and management of mediterranean farmed fish mortalities a risk management case study using apache spark |
| topic | machine learning data mining algorithms assessment decision trees random forest linear regression |
| url | https://www.mdpi.com/2076-3417/14/22/10112 |
| work_keys_str_mv | AT marioscgkikas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark AT dimitriscgkikas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark AT gerasimosvonitsanos applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark AT johnatheodorou applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark AT spyrossioutas applicationofmachinelearningforpredictiveanalysisandmanagementofmediterraneanfarmedfishmortalitiesariskmanagementcasestudyusingapachespark |