Pregnancy probability prediction models based on 5 machine learning algorithms and comparison of their performance
Objective To construct 5 machine-learning models and compare their performance in predicting the associations between pre-pregnancy socio-psycho-behavioral exposures of both spouses and preconception outcomes. Methods Based on Chongqing Preconception Reproductive Health and Birth Outcome Cohort...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Editorial Office of Journal of Army Medical University
2025-06-01
|
| Series: | 陆军军医大学学报 |
| Subjects: | |
| Online Access: | https://aammt.tmmu.edu.cn/html/202502013.html |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Objective To construct 5 machine-learning models and compare their performance in predicting the associations between pre-pregnancy socio-psycho-behavioral exposures of both spouses and preconception outcomes. Methods Based on Chongqing Preconception Reproductive Health and Birth Outcome Cohort of volunteers recruited from Chongqing Health Center for Women and Children during January 2019 and March 2022, 5 447 couples were recruited and surveyed through interviewer-interview for the demographic and social-psychological-behavioral data of both spouses (221 variables). According to the inclusion and exclusion criteria, 4 097 couples were finally included, and randomly assigned into a training set (n=2 867 spouses) and a validation set (n=1 230 spouses) at a ratio of 7∶3. Feature analysis and collinear screening were applied to select the potential exposure factors. In consideration of difficulty to carry out semen parameters analysis in primary healthcare institutions, feature Set 1 including sperm parameters and feature Set 2 excluding semen parameters were constructed by including or excluding sperm quality simultaneously in the training set and the validation set. Five algorithms, that is, Logistic Regression, Naive Bayes, Random Forest, Gradient Boosting Machine, and Support Vector Machine, were used to construct preconception outcome prediction models, and the parameters of each model were optimized using random search combined with grid search. The predictive performance of each model was compared using precision, recall, F1 score, area under the receiver operating characteristic curve (AUC), and calibration curve. The optimal model was then selected by comparing the changes in the predictive ability of the questionnaire data for fertility outcomes with or without semen parameters. Results There were 24 variables screened out in feature Set 1, and 16 variables in feature Set 2. In feature Set 1, the gradient boosting machine performed better, with a relatively higher AUC value (0.651) and better F1 score (0.61). The logistic regression model performed stably (AUC value =0.647) and was suitable as the reference model. The random forest (AUC value=0.641), Naive Bayes (AUC value=0.641), and support vector machine (AUC value=0.634) performed second-best. By utilizing the gradient boosting machine, comparable results were found between the predictions from feature sets with or without semen parameters, as in feature Set 1, the AUC value of its validation set was 0.651 (95%CI: 0.629~0.681), the prediction accuracy was 0.63, the recall rate was 0.65, and the average precision value F1 was 0.61; and in feature Set 2, the AUC value of its validation set was 0.649 (95%CI: 0.624~0.663), and both the calibration curves were close to the ideal curve. The prediction results indicated that in feature Set 1, the features highly negatively correlated with preconception outcomes were female age, male age, and no pregnancy within 1 year without contraception, while the features highly positively correlated with preconception outcomes were female pregnancy history, total sperm vitality, and use of contraceptive measures before enrollment. Conclusion Among the 5 machine-learning algorithms performed in this cohort data, the gradient boosting machine shows slightly better performance. There are 24 factors being associated with preconception outcomes in both spouses, and the performance of the simplified model excluding semen parameters is not significantly declined. It is feasible to use machine-learning methods to predict human preconception outcomes through social-psychological-behavioral questionnaires.
|
|---|---|
| ISSN: | 2097-0927 |