Anomaly-aware summary statistic from data batches

Abstract Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of incre...

Full description

Saved in:
Bibliographic Details
Main Author: G. Grosso
Format: Article
Language:English
Published: SpringerOpen 2024-12-01
Series:Journal of High Energy Physics
Subjects:
Online Access:https://doi.org/10.1007/JHEP12(2024)093
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850101825556971520
author G. Grosso
author_facet G. Grosso
author_sort G. Grosso
collection DOAJ
description Abstract Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of increasingly large and inclusive samples of acquired data with resource efficient computational methods. In this work we consider the New Physics Learning Machine (NPLM), a multivariate goodness-of-fit test built on the Neyman-Pearson maximum-likelihood-ratio construction, and we address the problem of testing large size samples under computational and storage resource constraints. We propose to perform parallel NPLM routines over batches of the data, and to combine them by locally aggregating over the data-to-reference density ratios learnt by each batch. The resulting data hypothesis defining the likelihood-ratio test is thus shared over the batches, and complies with the assumption that the expected rate of new physical processes is time invariant. We show that this method outperforms the simple sum of the independent tests run over the batches, and can recover, or even surpass, the sensitivity of the single test run over the full data. Beside the significant advantage for the offline application of NPLM to large size samples, the proposed approach offers new prospects toward the use of NPLM to construct anomaly-aware summary statistics in quasi-online data streaming scenarios.
format Article
id doaj-art-8ba7da4c21c1429cbce9fc3ac6f6a6f3
institution DOAJ
issn 1029-8479
language English
publishDate 2024-12-01
publisher SpringerOpen
record_format Article
series Journal of High Energy Physics
spelling doaj-art-8ba7da4c21c1429cbce9fc3ac6f6a6f32025-08-20T02:39:55ZengSpringerOpenJournal of High Energy Physics1029-84792024-12-0120241213010.1007/JHEP12(2024)093Anomaly-aware summary statistic from data batchesG. Grosso0The NSF AI Institute for Artificial Intelligence and Fundamental InteractionsAbstract Signal-agnostic data exploration based on machine learning could unveil very subtle statistical deviations of collider data from the expected Standard Model of particle physics. The beneficial impact of a large training sample on machine learning solutions motivates the exploration of increasingly large and inclusive samples of acquired data with resource efficient computational methods. In this work we consider the New Physics Learning Machine (NPLM), a multivariate goodness-of-fit test built on the Neyman-Pearson maximum-likelihood-ratio construction, and we address the problem of testing large size samples under computational and storage resource constraints. We propose to perform parallel NPLM routines over batches of the data, and to combine them by locally aggregating over the data-to-reference density ratios learnt by each batch. The resulting data hypothesis defining the likelihood-ratio test is thus shared over the batches, and complies with the assumption that the expected rate of new physical processes is time invariant. We show that this method outperforms the simple sum of the independent tests run over the batches, and can recover, or even surpass, the sensitivity of the single test run over the full data. Beside the significant advantage for the offline application of NPLM to large size samples, the proposed approach offers new prospects toward the use of NPLM to construct anomaly-aware summary statistics in quasi-online data streaming scenarios.https://doi.org/10.1007/JHEP12(2024)093Minimum BiasHadron-Hadron Scattering
spellingShingle G. Grosso
Anomaly-aware summary statistic from data batches
Journal of High Energy Physics
Minimum Bias
Hadron-Hadron Scattering
title Anomaly-aware summary statistic from data batches
title_full Anomaly-aware summary statistic from data batches
title_fullStr Anomaly-aware summary statistic from data batches
title_full_unstemmed Anomaly-aware summary statistic from data batches
title_short Anomaly-aware summary statistic from data batches
title_sort anomaly aware summary statistic from data batches
topic Minimum Bias
Hadron-Hadron Scattering
url https://doi.org/10.1007/JHEP12(2024)093
work_keys_str_mv AT ggrosso anomalyawaresummarystatisticfromdatabatches