Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites

Abstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord b...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuxin Feng, Ying Ni, Wenkai Wang, Fen Guo, Liyu Wang, Fan Zhu, Luyao Zhang, Ying Feng
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Pregnancy and Childbirth
Subjects:
Online Access:https://doi.org/10.1186/s12884-025-07884-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235290016710656
author Yuxin Feng
Ying Ni
Wenkai Wang
Fen Guo
Liyu Wang
Fan Zhu
Luyao Zhang
Ying Feng
author_facet Yuxin Feng
Ying Ni
Wenkai Wang
Fen Guo
Liyu Wang
Fan Zhu
Luyao Zhang
Ying Feng
author_sort Yuxin Feng
collection DOAJ
description Abstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk. Methods Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score. Results Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%. Conclusion DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment.
format Article
id doaj-art-48f31da801af4d1694902b618d6fe8a1
institution Kabale University
issn 1471-2393
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Pregnancy and Childbirth
spelling doaj-art-48f31da801af4d1694902b618d6fe8a12025-08-20T04:02:50ZengBMCBMC Pregnancy and Childbirth1471-23932025-07-0125111110.1186/s12884-025-07884-7Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sitesYuxin Feng0Ying Ni1Wenkai Wang2Fen Guo3Liyu Wang4Fan Zhu5Luyao Zhang6Ying Feng7Department of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityBeijing Key Laboratory of Traditional Chinese Medicine Protection and Utilization, Faculty of Geographical Science, Beijing Normal UniversityShuguang Hospital, Shanghai University of Traditional Chinese MedicineDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityAbstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk. Methods Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score. Results Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%. Conclusion DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment.https://doi.org/10.1186/s12884-025-07884-7Preterm birthCpG methylationCord bloodMachine learningLassoRandom forest
spellingShingle Yuxin Feng
Ying Ni
Wenkai Wang
Fen Guo
Liyu Wang
Fan Zhu
Luyao Zhang
Ying Feng
Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
BMC Pregnancy and Childbirth
Preterm birth
CpG methylation
Cord blood
Machine learning
Lasso
Random forest
title Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
title_full Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
title_fullStr Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
title_full_unstemmed Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
title_short Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
title_sort machine learning based prediction of preterm birth risk using methylation changes in neonatal cord blood cpg sites
topic Preterm birth
CpG methylation
Cord blood
Machine learning
Lasso
Random forest
url https://doi.org/10.1186/s12884-025-07884-7
work_keys_str_mv AT yuxinfeng machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT yingni machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT wenkaiwang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT fenguo machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT liyuwang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT fanzhu machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT luyaozhang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites
AT yingfeng machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites