Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites
Abstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord b...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | BMC Pregnancy and Childbirth |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12884-025-07884-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849235290016710656 |
|---|---|
| author | Yuxin Feng Ying Ni Wenkai Wang Fen Guo Liyu Wang Fan Zhu Luyao Zhang Ying Feng |
| author_facet | Yuxin Feng Ying Ni Wenkai Wang Fen Guo Liyu Wang Fan Zhu Luyao Zhang Ying Feng |
| author_sort | Yuxin Feng |
| collection | DOAJ |
| description | Abstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk. Methods Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score. Results Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%. Conclusion DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment. |
| format | Article |
| id | doaj-art-48f31da801af4d1694902b618d6fe8a1 |
| institution | Kabale University |
| issn | 1471-2393 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Pregnancy and Childbirth |
| spelling | doaj-art-48f31da801af4d1694902b618d6fe8a12025-08-20T04:02:50ZengBMCBMC Pregnancy and Childbirth1471-23932025-07-0125111110.1186/s12884-025-07884-7Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sitesYuxin Feng0Ying Ni1Wenkai Wang2Fen Guo3Liyu Wang4Fan Zhu5Luyao Zhang6Ying Feng7Department of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityBeijing Key Laboratory of Traditional Chinese Medicine Protection and Utilization, Faculty of Geographical Science, Beijing Normal UniversityShuguang Hospital, Shanghai University of Traditional Chinese MedicineDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityDepartment of Oncology, the Affiliated Suzhou Hospital of Nanjing Medical UniversityAbstract Background Preterm birth, defined as delivery before 37 weeks of gestation, is a major cause of neonatal morbidity and mortality. DNA methylation changes at CpG sites have been associated with the risk of preterm birth. Objective This study aimed to identify differential CpG sites in cord blood and develop predictive machine learning models based on these methylation changes to assess preterm birth risk. Methods Methylome data from 110 neonatal cord blood samples in the GSE110828 dataset were analyzed to identify CpG sites differing between preterm and full-term births (88 for training, and 22 for testing, respectively). Key CpG sites were selected using Lasso, Elastic Net, and Random Forest. Forty-five predictive models were constructed and evaluated for accuracy, precision, recall, and F1 score. Results Sixty-six CpG sites showed significant differences between preterm and full-term groups. Four models, including Random Forest with Lasso and Gradient Boosting with Random Forest, achieved optimal predictive performance, each with a validation accuracy of 93.75%. Conclusion DNA methylation changes at CpG sites in cord blood are associated with preterm birth risk. CpG-based methylation models demonstrate high predictive accuracy and hold promise for early clinical risk assessment.https://doi.org/10.1186/s12884-025-07884-7Preterm birthCpG methylationCord bloodMachine learningLassoRandom forest |
| spellingShingle | Yuxin Feng Ying Ni Wenkai Wang Fen Guo Liyu Wang Fan Zhu Luyao Zhang Ying Feng Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites BMC Pregnancy and Childbirth Preterm birth CpG methylation Cord blood Machine learning Lasso Random forest |
| title | Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites |
| title_full | Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites |
| title_fullStr | Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites |
| title_full_unstemmed | Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites |
| title_short | Machine learning-based prediction of preterm birth risk using methylation changes in neonatal cord blood CpG sites |
| title_sort | machine learning based prediction of preterm birth risk using methylation changes in neonatal cord blood cpg sites |
| topic | Preterm birth CpG methylation Cord blood Machine learning Lasso Random forest |
| url | https://doi.org/10.1186/s12884-025-07884-7 |
| work_keys_str_mv | AT yuxinfeng machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT yingni machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT wenkaiwang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT fenguo machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT liyuwang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT fanzhu machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT luyaozhang machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites AT yingfeng machinelearningbasedpredictionofpretermbirthriskusingmethylationchangesinneonatalcordbloodcpgsites |