Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction

Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving perfor...

Full description

Saved in:
Bibliographic Details
Main Authors: Junzhi Wen, Azim Ahmadzadeh, Manolis K. Georgoulis, Viacheslav M. Sadykov, Rafal A. Angryk
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:The Astrophysical Journal Supplement Series
Subjects:
Online Access:https://doi.org/10.3847/1538-4365/adb9e3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766646949871616
author Junzhi Wen
Azim Ahmadzadeh
Manolis K. Georgoulis
Viacheslav M. Sadykov
Rafal A. Angryk
author_facet Junzhi Wen
Azim Ahmadzadeh
Manolis K. Georgoulis
Viacheslav M. Sadykov
Rafal A. Angryk
author_sort Junzhi Wen
collection DOAJ
description Timely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance.
format Article
id doaj-art-1b206b24833a41e2b3b5e7fdccabeb63
institution DOAJ
issn 0067-0049
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series The Astrophysical Journal Supplement Series
spelling doaj-art-1b206b24833a41e2b3b5e7fdccabeb632025-08-20T03:04:30ZengIOP PublishingThe Astrophysical Journal Supplement Series0067-00492025-01-0127726010.3847/1538-4365/adb9e3Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare PredictionJunzhi Wen0https://orcid.org/0000-0002-9176-5273Azim Ahmadzadeh1https://orcid.org/0000-0002-1631-5336Manolis K. Georgoulis2https://orcid.org/0000-0001-6913-1330Viacheslav M. Sadykov3https://orcid.org/0000-0002-4001-1295Rafal A. Angryk4https://orcid.org/0000-0001-9598-8207Department of Computer Science, Georgia State University , Atlanta, GA 30302, USA ; jwen6@student.gsu.eduDepartment of Computer Science, University of Missouri-St. Louis , St. Louis, MO 63103, USAResearch Center for Astronomy and Applied Mathematics of the Academy of Athens , 11527 Athens, Greece; Johns Hopkins University Applied Physics Laboratory , Laurel, MD 20375, USAPhysics & Astronomy Department, Georgia State University , Atlanta, GA 30302, USADepartment of Computer Science, Georgia State University , Atlanta, GA 30302, USA ; jwen6@student.gsu.eduTimely and accurate prediction of solar flares is a crucial task due to the danger they pose to human life and infrastructure beyond Earth’s atmosphere. Although various machine learning algorithms have been employed to improve solar flare prediction, there has been limited focus on improving performance using outlier detection. In this study, we propose the use of a tree-based outlier detection algorithm, Isolation Forest (iForest), to identify multivariate time-series instances within the flare-forecasting benchmark data set, Space Weather Analytics for Solar Flares (SWAN-SF). By removing anomalous samples from the nonflaring class (N-class) data, we observe a significant improvement in both the true skill score and the updated Heidke skill score in two separate experiments. We focus on analyzing outliers detected by iForest at a 2.4% contamination rate, considered the most effective overall. Our analysis reveals a co-occurrence between the outliers we discovered and strong flares. Additionally, we investigated the similarity between the outliers and the strong-flare data and quantified it using Kullback–Leibler divergence. This analysis demonstrates a higher similarity between our outliers and strong-flare data when compared to the similarity between the outliers and the rest of the N-class data, supporting our rationale for using outlier detection to enhance SWAN-SF data for flare prediction. Furthermore, we explore a novel approach by treating our outliers as if they belong to flaring-class data in the training phase of our machine learning, resulting in further enhancements to our models’ performance.https://doi.org/10.3847/1538-4365/adb9e3Solar flaresSpace weatherTime series analysisOutlier detectionSupport vector machineClassification
spellingShingle Junzhi Wen
Azim Ahmadzadeh
Manolis K. Georgoulis
Viacheslav M. Sadykov
Rafal A. Angryk
Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
The Astrophysical Journal Supplement Series
Solar flares
Space weather
Time series analysis
Outlier detection
Support vector machine
Classification
title Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
title_full Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
title_fullStr Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
title_full_unstemmed Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
title_short Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction
title_sort outlier detection and removal in multivariate time series for a more robust machine learning based solar flare prediction
topic Solar flares
Space weather
Time series analysis
Outlier detection
Support vector machine
Classification
url https://doi.org/10.3847/1538-4365/adb9e3
work_keys_str_mv AT junzhiwen outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction
AT azimahmadzadeh outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction
AT manoliskgeorgoulis outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction
AT viacheslavmsadykov outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction
AT rafalaangryk outlierdetectionandremovalinmultivariatetimeseriesforamorerobustmachinelearningbasedsolarflareprediction