Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving perfor...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IOP Publishing
2025-01-01
|
| Series: | The Astrophysical Journal Supplement Series |
| Subjects: | |
| Online Access: | https://doi.org/10.3847/1538-4365/adb9e3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849766646949871616 |
|---|---|
| author | Junzhi Wen Azim Ahmadzadeh Manolis K. Georgoulis Viacheslav M. Sadykov Rafal A. Angryk |
| author_facet | Junzhi Wen Azim Ahmadzadeh Manolis K. Georgoulis Viacheslav M. Sadykov Rafal A. Angryk |
| author_sort | Junzhi Wen |
| collection | DOAJ |
| description | Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance. |
| format | Article |
| id | doaj-art-1b206b24833a41e2b3b5e7fdccabeb63 |
| institution | DOAJ |
| issn | 0067-0049 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IOP Publishing |
| record_format | Article |
| series | The Astrophysical Journal Supplement Series |
| spelling | doaj-art-1b206b24833a41e2b3b5e7fdccabeb632025-08-20T03:04:30ZengIOP PublishingThe Astrophysical Journal Supplement Series0067-00492025-01-0127726010.3847/1538-4365/adb9e3Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare PredictionJunzhi Wen0https://orcid.org/0000-0002-9176-5273Azim Ahmadzadeh1https://orcid.org/0000-0002-1631-5336Manolis K. Georgoulis2https://orcid.org/0000-0001-6913-1330Viacheslav M. Sadykov3https://orcid.org/0000-0002-4001-1295Rafal A. Angryk4https://orcid.org/0000-0001-9598-8207Department of Computer Science, Georgia State University , Atlanta, GA 30302, USA ; jwen6@student.gsu.eduDepartment of Computer Science, University of Missouri-St. Louis , St. Louis, MO 63103, USAResearch Center for Astronomy and Applied Mathematics of the Academy of Athens , 11527 Athens, Greece; Johns Hopkins University Applied Physics Laboratory , Laurel, MD 20375, USAPhysics & Astronomy Department, Georgia State University , Atlanta, GA 30302, USADepartment of Computer Science, Georgia State University , Atlanta, GA 30302, USA ; jwen6@student.gsu.eduTimely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance.https://doi.org/10.3847/1538-4365/adb9e3Solar flaresSpace weatherTime series analysisOutlier detectionSupport vector machineClassification |
| spellingShingle | Junzhi Wen Azim Ahmadzadeh Manolis K. Georgoulis Viacheslav M. Sadykov Rafal A. Angryk Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction The Astrophysical Journal Supplement Series Solar flares Space weather Time series analysis Outlier detection Support vector machine Classification |
| title | Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction |
| title_full | Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction |
| title_fullStr | Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction |
| title_full_unstemmed | Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction |
| title_short | Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction |
| title_sort | outlier detection and removal in multivariate time series for a more robust machine learning based solar flare prediction |
| topic | Solar flares Space weather Time series analysis Outlier detection Support vector machine Classification |
| url | https://doi.org/10.3847/1538-4365/adb9e3 |
| work_keys_str_mv | AT junzhiwen outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction AT azimahmadzadeh outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction AT manoliskgeorgoulis outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction AT viacheslavmsadykov outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction AT rafalaangryk outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction |