Pre-training on high-resolution X-ray images: an experimental study
Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in mas...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Visual Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44267-025-00080-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850242860604981248 |
|---|---|
| author | Xiao Wang Yuehang Li Wentao Wu Jiandong Jin Yao Rong Bo Jiang Chuanfu Li Jin Tang |
| author_facet | Xiao Wang Yuehang Li Wentao Wu Jiandong Jin Yao Rong Bo Jiang Chuanfu Li Jin Tang |
| author_sort | Xiao Wang |
| collection | DOAJ |
| description | Abstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets. |
| format | Article |
| id | doaj-art-ff43f52ec6c54fa7afc62b9f3d24735d |
| institution | OA Journals |
| issn | 2097-3330 2731-9008 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Visual Intelligence |
| spelling | doaj-art-ff43f52ec6c54fa7afc62b9f3d24735d2025-08-20T02:00:10ZengSpringerVisual Intelligence2097-33302731-90082025-05-013111510.1007/s44267-025-00080-3Pre-training on high-resolution X-ray images: an experimental studyXiao Wang0Yuehang Li1Wentao Wu2Jiandong Jin3Yao Rong4Bo Jiang5Chuanfu Li6Jin Tang7School of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversitySchool of Artificial Intelligence, Anhui UniversitySchool of Artificial Intelligence, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversitySchool of Computer Science and Technology, Anhui UniversityFirst Affiliated Hospital of Anhui University of Chinese MedicineSchool of Computer Science and Technology, Anhui UniversityAbstract Existing X-ray image based pre-trained vision models are typically trained on a relatively small-scale dataset (less than 500,000 samples) with limited resolution (e.g., 224 × 224 $224 \times 224$ ). However, the key to the success of self-supervised pre-training of large models lies in massive training data, and the maintenance of high-resolution X-ray images contributes to effective solutions for some challenging diseases. In this paper, we proposed a high-resolution ( 1280 × 1280 $1280 \times 1280$ ) X-ray image based pre-trained baseline model on our newly collected large-scale dataset containing more than 1 million X-ray images. Our model employs the masked auto-encoder framework, wherein the tokens that have been processed with a high rate are used as input, and the masked image patches are reconstructed by means of the Transformer encoder-decoder network. More importantly, a novel context-aware masking strategy has been introduced. This strategy utilizes the breast contour as a boundary for adaptive masking operations. We validate the effectiveness of our model through its application in two downstream tasks, namely X-ray report generation and disease detection. Extensive experiments demonstrate that our pre-trained medical baseline model can achieve comparable to, or even exceed, those of current state-of-the-art models on downstream benchmark datasets.https://doi.org/10.1007/s44267-025-00080-3High-resolution X-ray imagePre-trained big modelsMasked auto-encoder (MAE)Medical report generation |
| spellingShingle | Xiao Wang Yuehang Li Wentao Wu Jiandong Jin Yao Rong Bo Jiang Chuanfu Li Jin Tang Pre-training on high-resolution X-ray images: an experimental study Visual Intelligence High-resolution X-ray image Pre-trained big models Masked auto-encoder (MAE) Medical report generation |
| title | Pre-training on high-resolution X-ray images: an experimental study |
| title_full | Pre-training on high-resolution X-ray images: an experimental study |
| title_fullStr | Pre-training on high-resolution X-ray images: an experimental study |
| title_full_unstemmed | Pre-training on high-resolution X-ray images: an experimental study |
| title_short | Pre-training on high-resolution X-ray images: an experimental study |
| title_sort | pre training on high resolution x ray images an experimental study |
| topic | High-resolution X-ray image Pre-trained big models Masked auto-encoder (MAE) Medical report generation |
| url | https://doi.org/10.1007/s44267-025-00080-3 |
| work_keys_str_mv | AT xiaowang pretrainingonhighresolutionxrayimagesanexperimentalstudy AT yuehangli pretrainingonhighresolutionxrayimagesanexperimentalstudy AT wentaowu pretrainingonhighresolutionxrayimagesanexperimentalstudy AT jiandongjin pretrainingonhighresolutionxrayimagesanexperimentalstudy AT yaorong pretrainingonhighresolutionxrayimagesanexperimentalstudy AT bojiang pretrainingonhighresolutionxrayimagesanexperimentalstudy AT chuanfuli pretrainingonhighresolutionxrayimagesanexperimentalstudy AT jintang pretrainingonhighresolutionxrayimagesanexperimentalstudy |