Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification

Abstract Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL)...

Full description

Saved in:
Bibliographic Details
Main Authors: Bin Yang, Lei Ding, Jianqiang Li, Yong Li, Guangzhi Qu, Jingyi Wang, Qiang Wang, Bo Liu
Format: Article
Language:English
Published: Springer 2025-03-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01779-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850042321609949184
author Bin Yang
Lei Ding
Jianqiang Li
Yong Li
Guangzhi Qu
Jingyi Wang
Qiang Wang
Bo Liu
author_facet Bin Yang
Lei Ding
Jianqiang Li
Yong Li
Guangzhi Qu
Jingyi Wang
Qiang Wang
Bo Liu
author_sort Bin Yang
collection DOAJ
description Abstract Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL) methods struggle to effectively capture crucial spatial relationships in histopathological images. Existing methods incorporating positional information often overlook nuanced spatial correlations or use positional encoding strategies that do not fully capture the unique spatial dynamics of pathology images. To address this issue, we propose a new framework named TMIL (Transformer-based Multiple Instance Learning Network with 2D positional encoding), which leverages multiple instance learning for weakly supervised classification of histopathological images. TMIL incorporates a 2D positional encoding module, based on the Transformer, to model positional information and explore correlations between instances. Furthermore, TMIL divides histopathological images into pseudo-bags and trains patch-level feature vectors with deep metric learning to enhance classification performance. Finally, the proposed approach is evaluated on a public colorectal adenoma dataset. The experimental results show that TMIL outperforms existing MIL methods, achieving an AUC of 97.28% and an ACC of 95.19%. These findings suggest that TMIL’s integration of deep metric learning and positional encoding offers a promising approach for improving the efficiency and accuracy of pathology image analysis in cancer diagnosis.
format Article
id doaj-art-5ae0bf642121442697e9a6313aedf30a
institution DOAJ
issn 2199-4536
2198-6053
language English
publishDate 2025-03-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-5ae0bf642121442697e9a6313aedf30a2025-08-20T02:55:36ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-03-0111511710.1007/s40747-025-01779-yTransformer-based multiple instance learning network with 2D positional encoding for histopathology image classificationBin Yang0Lei Ding1Jianqiang Li2Yong Li3Guangzhi Qu4Jingyi Wang5Qiang Wang6Bo Liu7Center for Strategic Assessment and Consulting, Academy of Military ScienceFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyComputer Science and Engineering Department, Oakland UniversityFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologySchool of Mathematical and Computational Sciences, Massey UniversityAbstract Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL) methods struggle to effectively capture crucial spatial relationships in histopathological images. Existing methods incorporating positional information often overlook nuanced spatial correlations or use positional encoding strategies that do not fully capture the unique spatial dynamics of pathology images. To address this issue, we propose a new framework named TMIL (Transformer-based Multiple Instance Learning Network with 2D positional encoding), which leverages multiple instance learning for weakly supervised classification of histopathological images. TMIL incorporates a 2D positional encoding module, based on the Transformer, to model positional information and explore correlations between instances. Furthermore, TMIL divides histopathological images into pseudo-bags and trains patch-level feature vectors with deep metric learning to enhance classification performance. Finally, the proposed approach is evaluated on a public colorectal adenoma dataset. The experimental results show that TMIL outperforms existing MIL methods, achieving an AUC of 97.28% and an ACC of 95.19%. These findings suggest that TMIL’s integration of deep metric learning and positional encoding offers a promising approach for improving the efficiency and accuracy of pathology image analysis in cancer diagnosis.https://doi.org/10.1007/s40747-025-01779-yWeakly supervised trainingImage classificationMultiple instance learning
spellingShingle Bin Yang
Lei Ding
Jianqiang Li
Yong Li
Guangzhi Qu
Jingyi Wang
Qiang Wang
Bo Liu
Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
Complex & Intelligent Systems
Weakly supervised training
Image classification
Multiple instance learning
title Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
title_full Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
title_fullStr Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
title_full_unstemmed Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
title_short Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification
title_sort transformer based multiple instance learning network with 2d positional encoding for histopathology image classification
topic Weakly supervised training
Image classification
Multiple instance learning
url https://doi.org/10.1007/s40747-025-01779-y
work_keys_str_mv AT binyang transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT leiding transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT jianqiangli transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT yongli transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT guangzhiqu transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT jingyiwang transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT qiangwang transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification
AT boliu transformerbasedmultipleinstancelearningnetworkwith2dpositionalencodingforhistopathologyimageclassification