Transformer enabled multi-modal medical diagnosis for tuberculosis classification

Abstract Recently, multimodal data analysis in medical domain has started receiving a great attention. Researchers from both computer science, and medicine are trying to develop models to handle multimodal medical data. However, most of the published work have targeted the homogeneous multimodal dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Sachin Kumar, Shivani Sharma, Kassahun Tadesse Megra
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01054-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594684487139328
author Sachin Kumar
Shivani Sharma
Kassahun Tadesse Megra
author_facet Sachin Kumar
Shivani Sharma
Kassahun Tadesse Megra
author_sort Sachin Kumar
collection DOAJ
description Abstract Recently, multimodal data analysis in medical domain has started receiving a great attention. Researchers from both computer science, and medicine are trying to develop models to handle multimodal medical data. However, most of the published work have targeted the homogeneous multimodal data. The collection and preparation of heterogeneous multimodal data is a complex and time-consuming task. Further, development of models to handle such heterogeneous multimodal data is another challenge. This study presents a cross modal transformer-based fusion approach for multimodal clinical data analysis using medical images and clinical data. The proposed approach leverages the image embedding layer to convert image into visual tokens, and another clinical embedding layer to convert clinical data into text tokens. Further, a cross-modal transformer module is employed to learn a holistic representation of imaging and clinical modalities. The proposed approach was tested for a multi-modal lung disease tuberculosis data set. Further, the results are compared with recent approaches proposed in the field of multimodal medical data analysis. The comparison shows that the proposed approach outperformed the other approaches considered in the study. Another advantage of this approach is that it is faster to analyze heterogeneous multimodal medical data in comparison to existing methods used in the study, which is very important if we do not have powerful machines for computation.
format Article
id doaj-art-40b192958d9d452b92fbef6ad511e548
institution Kabale University
issn 2196-1115
language English
publishDate 2025-01-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-40b192958d9d452b92fbef6ad511e5482025-01-19T12:26:39ZengSpringerOpenJournal of Big Data2196-11152025-01-0112112110.1186/s40537-024-01054-wTransformer enabled multi-modal medical diagnosis for tuberculosis classificationSachin Kumar0Shivani Sharma1Kassahun Tadesse Megra2Akian College of Science and Engineering, American University of ArmeniaDepartment of Computer Science, Thapar Institute of Engineering and Technology (deemed to be University)Institute of Technology, Hawassa University Institute of TechnologyAbstract Recently, multimodal data analysis in medical domain has started receiving a great attention. Researchers from both computer science, and medicine are trying to develop models to handle multimodal medical data. However, most of the published work have targeted the homogeneous multimodal data. The collection and preparation of heterogeneous multimodal data is a complex and time-consuming task. Further, development of models to handle such heterogeneous multimodal data is another challenge. This study presents a cross modal transformer-based fusion approach for multimodal clinical data analysis using medical images and clinical data. The proposed approach leverages the image embedding layer to convert image into visual tokens, and another clinical embedding layer to convert clinical data into text tokens. Further, a cross-modal transformer module is employed to learn a holistic representation of imaging and clinical modalities. The proposed approach was tested for a multi-modal lung disease tuberculosis data set. Further, the results are compared with recent approaches proposed in the field of multimodal medical data analysis. The comparison shows that the proposed approach outperformed the other approaches considered in the study. Another advantage of this approach is that it is faster to analyze heterogeneous multimodal medical data in comparison to existing methods used in the study, which is very important if we do not have powerful machines for computation.https://doi.org/10.1186/s40537-024-01054-wTransformerMultimodal medical analysisTuberculosis classificationLung disease diagnosis
spellingShingle Sachin Kumar
Shivani Sharma
Kassahun Tadesse Megra
Transformer enabled multi-modal medical diagnosis for tuberculosis classification
Journal of Big Data
Transformer
Multimodal medical analysis
Tuberculosis classification
Lung disease diagnosis
title Transformer enabled multi-modal medical diagnosis for tuberculosis classification
title_full Transformer enabled multi-modal medical diagnosis for tuberculosis classification
title_fullStr Transformer enabled multi-modal medical diagnosis for tuberculosis classification
title_full_unstemmed Transformer enabled multi-modal medical diagnosis for tuberculosis classification
title_short Transformer enabled multi-modal medical diagnosis for tuberculosis classification
title_sort transformer enabled multi modal medical diagnosis for tuberculosis classification
topic Transformer
Multimodal medical analysis
Tuberculosis classification
Lung disease diagnosis
url https://doi.org/10.1186/s40537-024-01054-w
work_keys_str_mv AT sachinkumar transformerenabledmultimodalmedicaldiagnosisfortuberculosisclassification
AT shivanisharma transformerenabledmultimodalmedicaldiagnosisfortuberculosisclassification
AT kassahuntadessemegra transformerenabledmultimodalmedicaldiagnosisfortuberculosisclassification