A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources

IntroductionHigh-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates...

Full description

Saved in:
Bibliographic Details
Main Authors: Songyuan Tang, Tekin Bicer, Kamel Fezzaa, Samuel Clark
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-05-01
Series:Frontiers in High Performance Computing
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1537080/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849689619608633344
author Songyuan Tang
Tekin Bicer
Kamel Fezzaa
Samuel Clark
author_facet Songyuan Tang
Tekin Bicer
Kamel Fezzaa
Samuel Clark
author_sort Songyuan Tang
collection DOAJ
description IntroductionHigh-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates, the loss of scientifically significant phenomena.MethodsWe develop a Shifted Window (SWIN)-based vision transformer to reconstruct high-resolution x-ray image sequences with high fidelity and at a high frame rate and evaluate the underlying algorithmic framework on a high-performance computing (HPC) system. We characterize model parameters that could affect the training scalability, quality of the reconstruction, and running time during the model inference stage, such as the batch size, number of input frames to the model, their composition in terms of low and high-resolution frames, and the model size and architecture.ResultsWith 3 subsequent low resolution (LR) frames and another 2 high resolution (HR) frames differing in the spatial and temporal resolutions by factors of 4 and 20, respectively, the proposed algorithm achieved an average peak signal-to-noise ratio of 37.40 dB and 35.60 dB.DiscussionFurther, the model was trained on the Argonne Leadership Computing Facility's Polaris HPC system using 40 Nvidia A100 GPUs, speeding up the end-to-end training time by about ~10 × compared to the training with beamline-local computing resources.
format Article
id doaj-art-7a204e411cd44259b0e268b08fc783a7
institution DOAJ
issn 2813-7337
language English
publishDate 2025-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in High Performance Computing
spelling doaj-art-7a204e411cd44259b0e268b08fc783a72025-08-20T03:21:34ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-05-01310.3389/fhpcp.2025.15370801537080A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sourcesSongyuan Tang0Tekin Bicer1Kamel Fezzaa2Samuel Clark3Advanced Photon Source, Argonne National Laboratory, Lemont, IL, United StatesData Science and Learning Division, Argonne National Laboratory, Lemont, IL, United StatesAdvanced Photon Source, Argonne National Laboratory, Lemont, IL, United StatesAdvanced Photon Source, Argonne National Laboratory, Lemont, IL, United StatesIntroductionHigh-speed x-ray imaging experiments at synchrotron radiation facilities enable the acquisition of spatiotemporal measurements, reaching millions of frames per second. These high data acquisition rates are often prone to noisy measurements, or in the case of slower (but less noisy) rates, the loss of scientifically significant phenomena.MethodsWe develop a Shifted Window (SWIN)-based vision transformer to reconstruct high-resolution x-ray image sequences with high fidelity and at a high frame rate and evaluate the underlying algorithmic framework on a high-performance computing (HPC) system. We characterize model parameters that could affect the training scalability, quality of the reconstruction, and running time during the model inference stage, such as the batch size, number of input frames to the model, their composition in terms of low and high-resolution frames, and the model size and architecture.ResultsWith 3 subsequent low resolution (LR) frames and another 2 high resolution (HR) frames differing in the spatial and temporal resolutions by factors of 4 and 20, respectively, the proposed algorithm achieved an average peak signal-to-noise ratio of 37.40 dB and 35.60 dB.DiscussionFurther, the model was trained on the Argonne Leadership Computing Facility's Polaris HPC system using 40 Nvidia A100 GPUs, speeding up the end-to-end training time by about ~10 × compared to the training with beamline-local computing resources.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1537080/fullhigh-speed imagingspatio-temporal fusionvision transformerdistributed trainingfull-field x-ray radiography
spellingShingle Songyuan Tang
Tekin Bicer
Kamel Fezzaa
Samuel Clark
A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
Frontiers in High Performance Computing
high-speed imaging
spatio-temporal fusion
vision transformer
distributed training
full-field x-ray radiography
title A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
title_full A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
title_fullStr A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
title_full_unstemmed A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
title_short A SWIN-based vision transformer for high-fidelity and high-speed imaging experiments at light sources
title_sort swin based vision transformer for high fidelity and high speed imaging experiments at light sources
topic high-speed imaging
spatio-temporal fusion
vision transformer
distributed training
full-field x-ray radiography
url https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1537080/full
work_keys_str_mv AT songyuantang aswinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT tekinbicer aswinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT kamelfezzaa aswinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT samuelclark aswinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT songyuantang swinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT tekinbicer swinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT kamelfezzaa swinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources
AT samuelclark swinbasedvisiontransformerforhighfidelityandhighspeedimagingexperimentsatlightsources