Pre-training on high-resolution X-ray images: an experimental study

Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in mas...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang, Chuanfu Li, Jin Tang
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Visual Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44267-025-00080-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850242860604981248
author Xiao Wang
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bo Jiang
Chuanfu Li
Jin Tang
author_facet Xiao Wang
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bo Jiang
Chuanfu Li
Jin Tang
author_sort Xiao Wang
collection DOAJ
description Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets.
format Article
id doaj-art-ff43f52ec6c54fa7afc62b9f3d24735d
institution OA Journals
issn 2097-3330
2731-9008
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Visual Intelligence
spelling doaj-art-ff43f52ec6c54fa7afc62b9f3d24735d2025-08-20T02:00:10ZengSpringerVisual Intelligence2097-33302731-90082025-05-013111510.1007/s44267-025-00080-3Pre-training on high-resolution X-ray images: an experimental studyXiao Wang0Yuehang Li1Wentao Wu2Jiandong Jin3Yao Rong4Bo Jiang5Chuanfu Li6Jin Tang7School of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversitySchool of Artificial Intelligence, Anhui UniversitySchool of Artificial Intelligence, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversityFirst Affiliated Hospital of Anhui University of Chinese MedicineSchool of Computer Science and Technology, Anhui UniversityAbstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets.https://doi.org/10.1007/s44267-025-00080-3High-resolution X-ray imagePre-trained big modelsMasked auto-encoder (MAE)Medical report generation
spellingShingle Xiao Wang
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bo Jiang
Chuanfu Li
Jin Tang
Pre-training on high-resolution X-ray images: an experimental study
Visual Intelligence
High-resolution X-ray image
Pre-trained big models
Masked auto-encoder (MAE)
Medical report generation
title Pre-training on high-resolution X-ray images: an experimental study
title_full Pre-training on high-resolution X-ray images: an experimental study
title_fullStr Pre-training on high-resolution X-ray images: an experimental study
title_full_unstemmed Pre-training on high-resolution X-ray images: an experimental study
title_short Pre-training on high-resolution X-ray images: an experimental study
title_sort pre training on high resolution x ray images an experimental study
topic High-resolution X-ray image
Pre-trained big models
Masked auto-encoder (MAE)
Medical report generation
url https://doi.org/10.1007/s44267-025-00080-3
work_keys_str_mv AT xiaowang pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT yuehangli pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT wentaowu pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT jiandongjin pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT yaorong pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT bojiang pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT chuanfuli pretrainingonhighresolutionxrayimagesanexperimentalstudy
AT jintang pretrainingonhighresolutionxrayimagesanexperimentalstudy