ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration

Machine learning algorithms have the capacity to discern intricate features directly from raw data. We demonstrated the performance of top taggers built upon three machine learning architectures: a BDT that uses jet-level variables (high-level features, HLF) as input, a CNN (a miniature version of R...

Full description

Saved in:
Bibliographic Details
Main Author: Rameswar Sahu, Kirtiman Ghosh
Format: Article
Language:English
Published: SciPost 2024-12-01
Series:SciPost Physics
Online Access:https://scipost.org/SciPostPhys.17.6.166
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850061464731123712
author Rameswar Sahu, Kirtiman Ghosh
author_facet Rameswar Sahu, Kirtiman Ghosh
author_sort Rameswar Sahu, Kirtiman Ghosh
collection DOAJ
description Machine learning algorithms have the capacity to discern intricate features directly from raw data. We demonstrated the performance of top taggers built upon three machine learning architectures: a BDT that uses jet-level variables (high-level features, HLF) as input, a CNN (a miniature version of ResNet) trained on the jet image, and a GNN (LorentzNet) trained on the particle cloud representation of a jet utilizing the 4-momentum (low-level features, LLF) of the jet constituents as input. We found significant performance enhancement for all three classes of classifiers when trained on combined data from calorimeter towers and tracker detectors. The high resolution of the tracking data not only improved the classifier performance in the high transverse momentum region, but the information about the distribution and composition of charged and neutral constituents of the fat jets and subjets helped identify the quark/gluon origin of sub-jets and hence enhances top tagging efficiency. The LLF-based classifiers, such as CNN and GNN, exhibit significantly better performance when compared to HLF-based classifiers like BDT, especially in the high transverse momentum region. Nevertheless, the LLF-based classifiers trained on constituents' 4-momentum data exhibit substantial dependency on the jet modeling within Monte Carlo generators. The composite classifiers, formed by stacking a BDT on top of a GNN/CNN, not only enhance the performance of LLF-based classifiers but also mitigate the uncertainties stemming from the showering and hadronization model of the event generator. We have conducted a comprehensive study on the influence of the fat jet's reconstruction and labeling procedure on the efficiency of the classifiers.
format Article
id doaj-art-bb926286f4e840ca980c39e2f08aff5d
institution DOAJ
issn 2542-4653
language English
publishDate 2024-12-01
publisher SciPost
record_format Article
series SciPost Physics
spelling doaj-art-bb926286f4e840ca980c39e2f08aff5d2025-08-20T02:50:13ZengSciPostSciPost Physics2542-46532024-12-0117616610.21468/SciPostPhys.17.6.166ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integrationRameswar Sahu, Kirtiman GhoshMachine learning algorithms have the capacity to discern intricate features directly from raw data. We demonstrated the performance of top taggers built upon three machine learning architectures: a BDT that uses jet-level variables (high-level features, HLF) as input, a CNN (a miniature version of ResNet) trained on the jet image, and a GNN (LorentzNet) trained on the particle cloud representation of a jet utilizing the 4-momentum (low-level features, LLF) of the jet constituents as input. We found significant performance enhancement for all three classes of classifiers when trained on combined data from calorimeter towers and tracker detectors. The high resolution of the tracking data not only improved the classifier performance in the high transverse momentum region, but the information about the distribution and composition of charged and neutral constituents of the fat jets and subjets helped identify the quark/gluon origin of sub-jets and hence enhances top tagging efficiency. The LLF-based classifiers, such as CNN and GNN, exhibit significantly better performance when compared to HLF-based classifiers like BDT, especially in the high transverse momentum region. Nevertheless, the LLF-based classifiers trained on constituents' 4-momentum data exhibit substantial dependency on the jet modeling within Monte Carlo generators. The composite classifiers, formed by stacking a BDT on top of a GNN/CNN, not only enhance the performance of LLF-based classifiers but also mitigate the uncertainties stemming from the showering and hadronization model of the event generator. We have conducted a comprehensive study on the influence of the fat jet's reconstruction and labeling procedure on the efficiency of the classifiers.https://scipost.org/SciPostPhys.17.6.166
spellingShingle Rameswar Sahu, Kirtiman Ghosh
ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
SciPost Physics
title ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
title_full ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
title_fullStr ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
title_full_unstemmed ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
title_short ML-based top taggers: Performance, uncertainty and impact of tower & tracker data integration
title_sort ml based top taggers performance uncertainty and impact of tower tracker data integration
url https://scipost.org/SciPostPhys.17.6.166
work_keys_str_mv AT rameswarsahukirtimanghosh mlbasedtoptaggersperformanceuncertaintyandimpactoftowertrackerdataintegration