End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment
X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 millio...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-03-01
|
| Series: | Frontiers in High Performance Computing |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850053809983717376 |
|---|---|
| author | Cong Wang Valerio Mariani Frédéric Poitevin Matthew Avaylon Jana Thayer |
| author_facet | Cong Wang Valerio Mariani Frédéric Poitevin Matthew Avaylon Jana Thayer |
| author_sort | Cong Wang |
| collection | DOAJ |
| description | X-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments. |
| format | Article |
| id | doaj-art-1e5e7f3376e64efc9118a5c2f1459bf4 |
| institution | DOAJ |
| issn | 2813-7337 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in High Performance Computing |
| spelling | doaj-art-1e5e7f3376e64efc9118a5c2f1459bf42025-08-20T02:52:26ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-03-01310.3389/fhpcp.2025.15364711536471End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deploymentCong WangValerio MarianiFrédéric PoitevinMatthew AvaylonJana ThayerX-ray crystallography reconstruction, which transforms discrete X-ray diffraction patterns into three-dimensional molecular structures, relies critically on accurate Bragg peak finding for structure determination. As X-ray free electron laser (XFEL) facilities advance toward MHz data rates (1 million images per second), traditional peak finding algorithms that require manual parameter tuning or exhaustive grid searches across multiple experiments become increasingly impractical. While deep learning approaches offer promising solutions, their deployment in high-throughput environments presents significant challenges in automated dataset labeling, model scalability, edge deployment efficiency, and distributed inference capabilities. We present an end-to-end deep learning pipeline with three key components: (1) a data engine that combines traditional algorithms with our peak matching algorithm to generate high-quality training data at scale, (2) a modular architecture that scales from a few million to hundreds of million parameters, enabling us to train large expert-level models offline while deploying smaller, distilled models at the edge, and (3) a decoupled producer-consumer architecture that separates specialized data source layer from model inference, enabling flexible deployment across diverse computing environments. Using this integrated approach, our pipeline achieves accuracy comparable to traditional methods tuned by human experts while eliminating the need for experiment-specific parameter tuning. Although current throughput requires optimization for MHz facilities, our system's scalable architecture and demonstrated model compression capabilities provide a foundation for future high-throughput XFEL deployments.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/fulldeep learning in crystallographyreal-time Bragg peak findingmodel distillationproducer-consumer architectureX-ray free electron lasers |
| spellingShingle | Cong Wang Valerio Mariani Frédéric Poitevin Matthew Avaylon Jana Thayer End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment Frontiers in High Performance Computing deep learning in crystallography real-time Bragg peak finding model distillation producer-consumer architecture X-ray free electron lasers |
| title | End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment |
| title_full | End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment |
| title_fullStr | End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment |
| title_full_unstemmed | End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment |
| title_short | End-to-end deep learning pipeline for real-time Bragg peak segmentation: from training to large-scale deployment |
| title_sort | end to end deep learning pipeline for real time bragg peak segmentation from training to large scale deployment |
| topic | deep learning in crystallography real-time Bragg peak finding model distillation producer-consumer architecture X-ray free electron lasers |
| url | https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1536471/full |
| work_keys_str_mv | AT congwang endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment AT valeriomariani endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment AT fredericpoitevin endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment AT matthewavaylon endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment AT janathayer endtoenddeeplearningpipelineforrealtimebraggpeaksegmentationfromtrainingtolargescaledeployment |