Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations

Searching for novel small signals in noisy data is preferably pursued by correlation in two or more independently operating channels. Signals of potential interest exist in tails beyond κσ , where κ denotes a multiple of the standard deviation σ of the data. Since moving data is a major cost factor...

Full description

Saved in:
Bibliographic Details
Main Authors: Maryam Aghaei Abchouyeh, Maurice H. P. M. van Putten, Seyong Kim
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:The Astrophysical Journal Supplement Series
Subjects:
Online Access:https://doi.org/10.3847/1538-4365/adec9d
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849770816159350784
author Maryam Aghaei Abchouyeh
Maurice H. P. M. van Putten
Seyong Kim
author_facet Maryam Aghaei Abchouyeh
Maurice H. P. M. van Putten
Seyong Kim
author_sort Maryam Aghaei Abchouyeh
collection DOAJ
description Searching for novel small signals in noisy data is preferably pursued by correlation in two or more independently operating channels. Signals of potential interest exist in tails beyond κσ , where κ denotes a multiple of the standard deviation σ of the data. Since moving data is a major cost factor in big-data analysis and heterogeneous computing more generally, efficiency may be optimized by restricting the computation of correlations to the tails of two-channel data exceeding κσ . Already, a moderate value κ  ≳ 2 realizes a data reduction by at least an order of magnitude. Here, we study this approach using a novel excess probability ratio (EPR), correlating Boolean data resulting from tails beyond a cutoff κσ . We compare and rank EPR performance against conventional direct cross correlation and the Pearson coefficient (PC), applicable to the original data with no cutoff. This benchmark is performed over different combinations of background noise (Gaussian, Poisson and uniform) and signals (Gaussian, Poisson, uniform, chirps and sine waves). Results show the performance of EPR to be comparable to that of the PC, providing a new approach for significant improvements in efficiency with essentially no loss of sensitivity, relevant to the present era of big-data observatories.
format Article
id doaj-art-5cea910bb7b34b20bf43a9d465c73594
institution DOAJ
issn 0067-0049
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series The Astrophysical Journal Supplement Series
spelling doaj-art-5cea910bb7b34b20bf43a9d465c735942025-08-20T03:02:52ZengIOP PublishingThe Astrophysical Journal Supplement Series0067-00492025-01-012801710.3847/1538-4365/adec9dEfficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data ObservationsMaryam Aghaei Abchouyeh0https://orcid.org/0000-0002-1518-1946Maurice H. P. M. van Putten1https://orcid.org/0000-0002-9212-411XSeyong Kim2https://orcid.org/0000-0002-2102-7398Department of Physics and Astronomy, Sejong University , 98 Gunja-Dong, Gwangjin-gu, Seoul 143-747, Republic of Korea ; mvp@sejong.ac.krDepartment of Physics and Astronomy, Sejong University , 98 Gunja-Dong, Gwangjin-gu, Seoul 143-747, Republic of Korea ; mvp@sejong.ac.kr; INAF-OAS Bologna , via P. Gobetti, 101, I-40129 Bologna, ItalyDepartment of Physics and Astronomy, Sejong University , 98 Gunja-Dong, Gwangjin-gu, Seoul 143-747, Republic of Korea ; mvp@sejong.ac.krSearching for novel small signals in noisy data is preferably pursued by correlation in two or more independently operating channels. Signals of potential interest exist in tails beyond κσ , where κ denotes a multiple of the standard deviation σ of the data. Since moving data is a major cost factor in big-data analysis and heterogeneous computing more generally, efficiency may be optimized by restricting the computation of correlations to the tails of two-channel data exceeding κσ . Already, a moderate value κ  ≳ 2 realizes a data reduction by at least an order of magnitude. Here, we study this approach using a novel excess probability ratio (EPR), correlating Boolean data resulting from tails beyond a cutoff κσ . We compare and rank EPR performance against conventional direct cross correlation and the Pearson coefficient (PC), applicable to the original data with no cutoff. This benchmark is performed over different combinations of background noise (Gaussian, Poisson and uniform) and signals (Gaussian, Poisson, uniform, chirps and sine waves). Results show the performance of EPR to be comparable to that of the PC, providing a new approach for significant improvements in efficiency with essentially no loss of sensitivity, relevant to the present era of big-data observatories.https://doi.org/10.3847/1538-4365/adec9dAstronomy data analysis
spellingShingle Maryam Aghaei Abchouyeh
Maurice H. P. M. van Putten
Seyong Kim
Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
The Astrophysical Journal Supplement Series
Astronomy data analysis
title Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
title_full Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
title_fullStr Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
title_full_unstemmed Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
title_short Efficient Searches for Small Signals across Two-channel Noisy Data: A Challenge in Big-data Observations
title_sort efficient searches for small signals across two channel noisy data a challenge in big data observations
topic Astronomy data analysis
url https://doi.org/10.3847/1538-4365/adec9d
work_keys_str_mv AT maryamaghaeiabchouyeh efficientsearchesforsmallsignalsacrosstwochannelnoisydataachallengeinbigdataobservations
AT mauricehpmvanputten efficientsearchesforsmallsignalsacrosstwochannelnoisydataachallengeinbigdataobservations
AT seyongkim efficientsearchesforsmallsignalsacrosstwochannelnoisydataachallengeinbigdataobservations