End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment

X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 millio...

Full description

Saved in:
Bibliographic Details
Main Authors: Cong Wang, Valerio Mariani, Frédéric Poitevin, Matthew Avaylon, Jana Thayer
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-03-01
Series:Frontiers in High Performance Computing
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850053809983717376
author Cong Wang
Valerio Mariani
Frédéric Poitevin
Matthew Avaylon
Jana Thayer
author_facet Cong Wang
Valerio Mariani
Frédéric Poitevin
Matthew Avaylon
Jana Thayer
author_sort Cong Wang
collection DOAJ
description X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments.
format Article
id doaj-art-1e5e7f3376e64efc9118a5c2f1459bf4
institution DOAJ
issn 2813-7337
language English
publishDate 2025-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in High Performance Computing
spelling doaj-art-1e5e7f3376e64efc9118a5c2f1459bf42025-08-20T02:52:26ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-03-01310.3389/fhpcp.2025.15364711536471End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deploymentCong WangValerio MarianiFrédéric PoitevinMatthew AvaylonJana ThayerX-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/fulldeep learning in crystallographyreal-time Bragg peak findingmodel distillationproducer-consumer architectureX-ray free electron lasers
spellingShingle Cong Wang
Valerio Mariani
Frédéric Poitevin
Matthew Avaylon
Jana Thayer
End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
Frontiers in High Performance Computing
deep learning in crystallography
real-time Bragg peak finding
model distillation
producer-consumer architecture
X-ray free electron lasers
title End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
title_full End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
title_fullStr End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
title_full_unstemmed End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
title_short End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
title_sort end to end deep learning pipeline for real time bragg peak segmentation from training to large scale deployment
topic deep learning in crystallography
real-time Bragg peak finding
model distillation
producer-consumer architecture
X-ray free electron lasers
url https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/full
work_keys_str_mv AT congwang endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment
AT valeriomariani endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment
AT fredericpoitevin endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment
AT matthewavaylon endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment
AT janathayer endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment