Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications
Good data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivation...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10901960/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850235730712854528 |
|---|---|
| author | Fatih Topaloglu |
| author_facet | Fatih Topaloglu |
| author_sort | Fatih Topaloglu |
| collection | DOAJ |
| description | Good data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivations for the study. The first motivation of the study was to define the estimation of missing data in the meteorological data set and its usability in the nuclear energy industry by using Machine Learning (ML)-based Linear Regression (LR), Decision Trees (DT) and Random Forest (RF) algorithms. Its second motivation is to determine the optimum set/number of meteorological data required for nuclear energy projects using the best-performing ML algorithm. For this purpose, 31 years of meteorological data regarding the wind speed, rainfall amount, snowpack and air temperature required for nuclear energy projects by the nuclear policy board in Turkey were analyzed. In this way, some difficulties such as processing and organizing the data created by unnecessary and large data due to its volume and speed have been prevented. In this study, which is based on incomplete meteorological measurement data, the mechanism belongs to the MCAR type. Linear Regression method reached the highest performance with 91.6%. Additionally, by normalizing the data set using Standardization and Normalization scaling techniques, this performance increased to 93.3% and 98.9%, respectively. On the other hand, it has been observed that a 14-year training set is sufficient as a data set in nuclear energy applications. |
| format | Article |
| id | doaj-art-710e180aa31f42abafa10cd8a09b75ea |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-710e180aa31f42abafa10cd8a09b75ea2025-08-20T02:02:09ZengIEEEIEEE Access2169-35362025-01-0113370193703410.1109/ACCESS.2025.354536110901960Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy ApplicationsFatih Topaloglu0https://orcid.org/0000-0002-2089-5214Department of Computer Engineering, Malatya Turgut Özal University, Malatya, TürkiyeGood data analysis is required for the optimal design of nuclear energy projects. However, due to financial or technical reasons, data cannot be collected regularly, which leads to missing data problems. Missing values in data sets can seriously affect research results. There are two main motivations for the study. The first motivation of the study was to define the estimation of missing data in the meteorological data set and its usability in the nuclear energy industry by using Machine Learning (ML)-based Linear Regression (LR), Decision Trees (DT) and Random Forest (RF) algorithms. Its second motivation is to determine the optimum set/number of meteorological data required for nuclear energy projects using the best-performing ML algorithm. For this purpose, 31 years of meteorological data regarding the wind speed, rainfall amount, snowpack and air temperature required for nuclear energy projects by the nuclear policy board in Turkey were analyzed. In this way, some difficulties such as processing and organizing the data created by unnecessary and large data due to its volume and speed have been prevented. In this study, which is based on incomplete meteorological measurement data, the mechanism belongs to the MCAR type. Linear Regression method reached the highest performance with 91.6%. Additionally, by normalizing the data set using Standardization and Normalization scaling techniques, this performance increased to 93.3% and 98.9%, respectively. On the other hand, it has been observed that a 14-year training set is sufficient as a data set in nuclear energy applications.https://ieeexplore.ieee.org/document/10901960/Nuclear energymissing datamachine learninglinear regressiondecision treesrandom forest |
| spellingShingle | Fatih Topaloglu Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications IEEE Access Nuclear energy missing data machine learning linear regression decision trees random forest |
| title | Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications |
| title_full | Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications |
| title_fullStr | Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications |
| title_full_unstemmed | Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications |
| title_short | Machine Learning-Based Approaches and Comparisons for Estimating Missing Meteorological Data and Determining the Optimum Data Set in Nuclear Energy Applications |
| title_sort | machine learning based approaches and comparisons for estimating missing meteorological data and determining the optimum data set in nuclear energy applications |
| topic | Nuclear energy missing data machine learning linear regression decision trees random forest |
| url | https://ieeexplore.ieee.org/document/10901960/ |
| work_keys_str_mv | AT fatihtopaloglu machinelearningbasedapproachesandcomparisonsforestimatingmissingmeteorologicaldataanddeterminingtheoptimumdatasetinnuclearenergyapplications |