Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data

Abstract The prevalence of type 2 diabetes mellitus (T2DM) in Korea has risen in recent years, yet many cases remain undiagnosed. Advanced artificial intelligence models using multi-modal data have shown promise in disease prediction, but two major challenges persist: the scarcity of samples contain...

Full description

Saved in:
Bibliographic Details
Main Authors: YounSung Jung, SeanKyo Han, EunHee Kang, SoYoung Park, MinHee Kim, NanHee Kim, TaeJin Ahn
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-05532-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849763867581743104
author YounSung Jung
SeanKyo Han
EunHee Kang
SoYoung Park
MinHee Kim
NanHee Kim
TaeJin Ahn
author_facet YounSung Jung
SeanKyo Han
EunHee Kang
SoYoung Park
MinHee Kim
NanHee Kim
TaeJin Ahn
author_sort YounSung Jung
collection DOAJ
description Abstract The prevalence of type 2 diabetes mellitus (T2DM) in Korea has risen in recent years, yet many cases remain undiagnosed. Advanced artificial intelligence models using multi-modal data have shown promise in disease prediction, but two major challenges persist: the scarcity of samples containing all desired data modalities and class imbalance in T2DM datasets. We propose a novel transfer learning framework to predict T2DM onset within five years, using two Korean cohorts (KoGES and SNUH). To utilize unpaired multi-modal data, our approach transfers knowledge between clinical and genetic domains, leveraging unpaired clinical data alongside paired data. We also address class imbalance by applying a positively weighted binary cross-entropy (BCE) loss and a weighted random sampler (WRS). The transfer learning framework improved T2DM prediction performance. Using WRS and weighted BCE loss increased the model’s balanced accuracy and AUC (achieving test AUC 0.8441). Furthermore, combining transfer learning with intermediate data fusion yielded even higher performance (test AUC 0.8715). These enhancements were achieved despite limited paired multi-modal samples. Our framework effectively handles scarce paired data and class imbalance, leading to improved T2DM risk prediction. This approach can be adapted to other medical prediction tasks and integrated with additional data modalities, potentially aiding earlier diagnosis and better disease management in clinical settings.
format Article
id doaj-art-1f77ee6d39ec4a1195220a4abd92d844
institution DOAJ
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-1f77ee6d39ec4a1195220a4abd92d8442025-08-20T03:05:17ZengNature PortfolioScientific Reports2045-23222025-07-0115111310.1038/s41598-025-05532-wTransfer learning prediction of type 2 diabetes with unpaired clinical and genetic dataYounSung Jung0SeanKyo Han1EunHee Kang2SoYoung Park3MinHee Kim4NanHee Kim5TaeJin Ahn6Department of Life Science, Handong Global UniversityDepartment of Life Science, Handong Global UniversityDepartment of Life Science, Handong Global UniversityDivision of Endocrinology and Metabolism, Department of Internal Medicine, Korea University Ansan HospitalBiomedical Research Center, Korea University Ansan HospitalDivision of Endocrinology and Metabolism, Department of Internal Medicine, Korea University Ansan HospitalDepartment of Life Science, Handong Global UniversityAbstract The prevalence of type 2 diabetes mellitus (T2DM) in Korea has risen in recent years, yet many cases remain undiagnosed. Advanced artificial intelligence models using multi-modal data have shown promise in disease prediction, but two major challenges persist: the scarcity of samples containing all desired data modalities and class imbalance in T2DM datasets. We propose a novel transfer learning framework to predict T2DM onset within five years, using two Korean cohorts (KoGES and SNUH). To utilize unpaired multi-modal data, our approach transfers knowledge between clinical and genetic domains, leveraging unpaired clinical data alongside paired data. We also address class imbalance by applying a positively weighted binary cross-entropy (BCE) loss and a weighted random sampler (WRS). The transfer learning framework improved T2DM prediction performance. Using WRS and weighted BCE loss increased the model’s balanced accuracy and AUC (achieving test AUC 0.8441). Furthermore, combining transfer learning with intermediate data fusion yielded even higher performance (test AUC 0.8715). These enhancements were achieved despite limited paired multi-modal samples. Our framework effectively handles scarce paired data and class imbalance, leading to improved T2DM risk prediction. This approach can be adapted to other medical prediction tasks and integrated with additional data modalities, potentially aiding earlier diagnosis and better disease management in clinical settings.https://doi.org/10.1038/s41598-025-05532-w
spellingShingle YounSung Jung
SeanKyo Han
EunHee Kang
SoYoung Park
MinHee Kim
NanHee Kim
TaeJin Ahn
Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
Scientific Reports
title Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
title_full Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
title_fullStr Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
title_full_unstemmed Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
title_short Transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
title_sort transfer learning prediction of type 2 diabetes with unpaired clinical and genetic data
url https://doi.org/10.1038/s41598-025-05532-w
work_keys_str_mv AT younsungjung transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT seankyohan transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT eunheekang transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT soyoungpark transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT minheekim transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT nanheekim transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata
AT taejinahn transferlearningpredictionoftype2diabeteswithunpairedclinicalandgeneticdata