Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications

Good data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivation...

Full description

Saved in:
Bibliographic Details
Main Author: Fatih Topaloglu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10901960/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850235730712854528
author Fatih Topaloglu
author_facet Fatih Topaloglu
author_sort Fatih Topaloglu
collection DOAJ
description Good data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivations for the study. The first motivation of the study was to define the estimation of missing data in the meteorological data set and its usability in the nuclear energy industry by using Machine Learning (ML)-based Linear Regression (LR), Decision Trees (DT) and Random Forest (RF) algorithms. Its second motivation is to determine the optimum set/number of meteorological data required for nuclear energy projects using the best-performing ML algorithm. For this purpose, 31 years of meteorological data regarding the wind speed, rainfall amount, snowpack and air temperature required for nuclear energy projects by the nuclear policy board in Turkey were analyzed. In this way, some difficulties such as processing and organizing the data created by unnecessary and large data due to its volume and speed have been prevented. In this study, which is based on incomplete meteorological measurement data, the mechanism belongs to the MCAR type. Linear Regression method reached the highest performance with 91.6%. Additionally, by normalizing the data set using Standardization and Normalization scaling techniques, this performance increased to 93.3% and 98.9%, respectively. On the other hand, it has been observed that a 14-year training set is sufficient as a data set in nuclear energy applications.
format Article
id doaj-art-710e180aa31f42abafa10cd8a09b75ea
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-710e180aa31f42abafa10cd8a09b75ea2025-08-20T02:02:09ZengIEEEIEEE Access2169-35362025-01-0113370193703410.1109/ACCESS.2025.354536110901960Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy ApplicationsFatih Topaloglu0https://orcid.org/0000-0002-2089-5214Department of Computer Engineering, Malatya Turgut Özal University, Malatya, TürkiyeGood data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivations for the study. The first motivation of the study was to define the estimation of missing data in the meteorological data set and its usability in the nuclear energy industry by using Machine Learning (ML)-based Linear Regression (LR), Decision Trees (DT) and Random Forest (RF) algorithms. Its second motivation is to determine the optimum set/number of meteorological data required for nuclear energy projects using the best-performing ML algorithm. For this purpose, 31 years of meteorological data regarding the wind speed, rainfall amount, snowpack and air temperature required for nuclear energy projects by the nuclear policy board in Turkey were analyzed. In this way, some difficulties such as processing and organizing the data created by unnecessary and large data due to its volume and speed have been prevented. In this study, which is based on incomplete meteorological measurement data, the mechanism belongs to the MCAR type. Linear Regression method reached the highest performance with 91.6%. Additionally, by normalizing the data set using Standardization and Normalization scaling techniques, this performance increased to 93.3% and 98.9%, respectively. On the other hand, it has been observed that a 14-year training set is sufficient as a data set in nuclear energy applications.https://ieeexplore.ieee.org/document/10901960/Nuclear energymissing datamachine learninglinear regressiondecision treesrandom forest
spellingShingle Fatih Topaloglu
Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
IEEE Access
Nuclear energy
missing data
machine learning
linear regression
decision trees
random forest
title Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
title_full Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
title_fullStr Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
title_full_unstemmed Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
title_short Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
title_sort machine learning based approaches and comparisons for estimating missing meteorological data and determining the optimum data set in nuclear energy applications
topic Nuclear energy
missing data
machine learning
linear regression
decision trees
random forest
url https://ieeexplore.ieee.org/document/10901960/
work_keys_str_mv AT fatihtopaloglu machinelearningbasedapproachesandcomparisonsforestimatingmissingmeteorologicaldataanddeterminingtheoptimumdatasetinnuclearenergyapplications