Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction

BackgroundEarly identification of Type 1 Diabetes Mellitus (T1DM) in pediatric populations is crucial for implementing timely interventions and improving long-term outcomes. Peripheral blood transcriptomic analysis provides a minimally invasive approach for identifying predictive biomarkers prior to...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Huang, Di Ouyang, Weiming Xie, Huawei Zhuang, Siyu Gao, Pan Liu, Lizhong Guo
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2025.1636214/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849715921466163200
author Xin Huang
Xin Huang
Di Ouyang
Weiming Xie
Huawei Zhuang
Siyu Gao
Pan Liu
Lizhong Guo
author_facet Xin Huang
Xin Huang
Di Ouyang
Weiming Xie
Huawei Zhuang
Siyu Gao
Pan Liu
Lizhong Guo
author_sort Xin Huang
collection DOAJ
description BackgroundEarly identification of Type 1 Diabetes Mellitus (T1DM) in pediatric populations is crucial for implementing timely interventions and improving long-term outcomes. Peripheral blood transcriptomic analysis provides a minimally invasive approach for identifying predictive biomarkers prior to clinical manifestation. This study aimed to develop and validate machine learning algorithms utilizing transcriptomic signatures to predict T1DM onset in children up to 46 months before clinical diagnosis.MethodsWe analyzed 247 peripheral blood RNA-sequencing samples from pre-diabetic children and age-matched healthy controls. Differential gene expression analysis was performed using established bioinformatics pipelines to identify significantly dysregulated transcripts. Five feature selection methods (Lasso, Elastic Net, Random Forest, Support Vector Machine, and Gradient Boosting Machine) were employed to optimize gene sets. Nine machine learning algorithms (Decision Tree, Gradient Boosting Machine, K-Nearest Neighbors, Linear Discriminant Analysis, Logistic Regression, Multilayer Perceptron, Naive Bayes, Random Forest, and Support Vector Machine) were combined with selected features, generating 45 unique model combinations. Performance was evaluated using accuracy, precision, recall, and F1-score metrics. Model validation was conducted using quantitative polymerase chain reaction (qPCR) in an independent cohort of six children (three healthy, three diabetic).ResultsTranscriptomic analysis revealed significant differential expression patterns between pre-diabetic and control groups. Four model combinations demonstrated superior predictive performance: Lasso+K-Nearest Neighbors, Elastic Net + K-Nearest Neighbors, Elastic Net + Random Forest, and Support Vector Machine+K-Nearest Neighbors. These models achieved high accuracy in predicting diabetes onset up to 46 months before clinical diagnosis. Both Elastic Net-based models achieved perfect classification performance in the validation cohort, demonstrating their potential as clinically viable diagnostic tools.ConclusionThis study establishes the feasibility of integrating peripheral blood transcriptomic profiling with machine learning for early pediatric T1DM prediction. The identified transcriptomic signatures and validated predictive models provide a foundation for developing clinically translatable, non-invasive diagnostic tools. These findings support the implementation of precision medicine approaches for childhood diabetes prevention and warrant validation in larger, multi-center cohorts to assess generalizability and clinical utility.
format Article
id doaj-art-ef8b1ddc324b44a58997ee798110adf8
institution DOAJ
issn 2296-858X
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-ef8b1ddc324b44a58997ee798110adf82025-08-20T03:13:11ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-07-011210.3389/fmed.2025.16362141636214Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes predictionXin Huang0Xin Huang1Di Ouyang2Weiming Xie3Huawei Zhuang4Siyu Gao5Pan Liu6Lizhong Guo7The First Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, ChinaYulin Hospital of Traditional Chinese Medicine, Yulin, ChinaTraditional Chinese Medicine Hospital of Yulin, Yulin, ChinaAcademic Affairs and Research Management Office, Yulin Campus of Guangxi Medical University, Yulin, Guangxi, ChinaThe First Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, ChinaYulin Hospital of Traditional Chinese Medicine, Yulin, ChinaHuai’an No.3 People’s Hospital, Huai’an Second Clinical College of Xuzhou Medical University, Huai’an, ChinaThe First Clinical Medical College, Nanjing University of Chinese Medicine, Nanjing, ChinaBackgroundEarly identification of Type 1 Diabetes Mellitus (T1DM) in pediatric populations is crucial for implementing timely interventions and improving long-term outcomes. Peripheral blood transcriptomic analysis provides a minimally invasive approach for identifying predictive biomarkers prior to clinical manifestation. This study aimed to develop and validate machine learning algorithms utilizing transcriptomic signatures to predict T1DM onset in children up to 46 months before clinical diagnosis.MethodsWe analyzed 247 peripheral blood RNA-sequencing samples from pre-diabetic children and age-matched healthy controls. Differential gene expression analysis was performed using established bioinformatics pipelines to identify significantly dysregulated transcripts. Five feature selection methods (Lasso, Elastic Net, Random Forest, Support Vector Machine, and Gradient Boosting Machine) were employed to optimize gene sets. Nine machine learning algorithms (Decision Tree, Gradient Boosting Machine, K-Nearest Neighbors, Linear Discriminant Analysis, Logistic Regression, Multilayer Perceptron, Naive Bayes, Random Forest, and Support Vector Machine) were combined with selected features, generating 45 unique model combinations. Performance was evaluated using accuracy, precision, recall, and F1-score metrics. Model validation was conducted using quantitative polymerase chain reaction (qPCR) in an independent cohort of six children (three healthy, three diabetic).ResultsTranscriptomic analysis revealed significant differential expression patterns between pre-diabetic and control groups. Four model combinations demonstrated superior predictive performance: Lasso+K-Nearest Neighbors, Elastic Net + K-Nearest Neighbors, Elastic Net + Random Forest, and Support Vector Machine+K-Nearest Neighbors. These models achieved high accuracy in predicting diabetes onset up to 46 months before clinical diagnosis. Both Elastic Net-based models achieved perfect classification performance in the validation cohort, demonstrating their potential as clinically viable diagnostic tools.ConclusionThis study establishes the feasibility of integrating peripheral blood transcriptomic profiling with machine learning for early pediatric T1DM prediction. The identified transcriptomic signatures and validated predictive models provide a foundation for developing clinically translatable, non-invasive diagnostic tools. These findings support the implementation of precision medicine approaches for childhood diabetes prevention and warrant validation in larger, multi-center cohorts to assess generalizability and clinical utility.https://www.frontiersin.org/articles/10.3389/fmed.2025.1636214/fullchildhood diabetesperipheral bloodtranscriptomic analysismachine learningpediatric biomarkers
spellingShingle Xin Huang
Xin Huang
Di Ouyang
Weiming Xie
Huawei Zhuang
Siyu Gao
Pan Liu
Lizhong Guo
Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
Frontiers in Medicine
childhood diabetes
peripheral blood
transcriptomic analysis
machine learning
pediatric biomarkers
title Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
title_full Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
title_fullStr Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
title_full_unstemmed Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
title_short Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction
title_sort development and validation of machine learning based diagnostic models using blood transcriptomics for early childhood diabetes prediction
topic childhood diabetes
peripheral blood
transcriptomic analysis
machine learning
pediatric biomarkers
url https://www.frontiersin.org/articles/10.3389/fmed.2025.1636214/full
work_keys_str_mv AT xinhuang developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT xinhuang developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT diouyang developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT weimingxie developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT huaweizhuang developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT siyugao developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT panliu developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction
AT lizhongguo developmentandvalidationofmachinelearningbaseddiagnosticmodelsusingbloodtranscriptomicsforearlychildhooddiabetesprediction