Predicting the time to get back to work using statistical models and machine learning approaches
Abstract Background Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown. Objectives To compare model performance and predictive accuracy of classic regressions and machine learning appro...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2024-11-01
|
| Series: | BMC Medical Research Methodology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12874-024-02390-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850064851379945472 |
|---|---|
| author | George Bouliotis M. Underwood R. Froud |
| author_facet | George Bouliotis M. Underwood R. Froud |
| author_sort | George Bouliotis |
| collection | DOAJ |
| description | Abstract Background Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown. Objectives To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme. Methods The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival). Results At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were ‘family issues and additional barriers’, ‘restriction of hours’, ‘available CV’, ‘self-employment considered’ and ‘education’. The Harrell’s Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences. Conclusion Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results. |
| format | Article |
| id | doaj-art-fb67c0ad0cb545b18d9c5fcda8b8e2ca |
| institution | DOAJ |
| issn | 1471-2288 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Research Methodology |
| spelling | doaj-art-fb67c0ad0cb545b18d9c5fcda8b8e2ca2025-08-20T02:49:09ZengBMCBMC Medical Research Methodology1471-22882024-11-012411810.1186/s12874-024-02390-4Predicting the time to get back to work using statistical models and machine learning approachesGeorge Bouliotis0M. Underwood1R. Froud2Warwick Clinical Trials Unit, University of WarwickWarwick Clinical Trials Unit, University of WarwickHøyskolen KristianiaAbstract Background Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown. Objectives To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme. Methods The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival). Results At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were ‘family issues and additional barriers’, ‘restriction of hours’, ‘available CV’, ‘self-employment considered’ and ‘education’. The Harrell’s Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences. Conclusion Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.https://doi.org/10.1186/s12874-024-02390-4Machine LearningSurvival analysisStatistical methodsReturn to workSocioeconomic deprivation |
| spellingShingle | George Bouliotis M. Underwood R. Froud Predicting the time to get back to work using statistical models and machine learning approaches BMC Medical Research Methodology Machine Learning Survival analysis Statistical methods Return to work Socioeconomic deprivation |
| title | Predicting the time to get back to work using statistical models and machine learning approaches |
| title_full | Predicting the time to get back to work using statistical models and machine learning approaches |
| title_fullStr | Predicting the time to get back to work using statistical models and machine learning approaches |
| title_full_unstemmed | Predicting the time to get back to work using statistical models and machine learning approaches |
| title_short | Predicting the time to get back to work using statistical models and machine learning approaches |
| title_sort | predicting the time to get back to work using statistical models and machine learning approaches |
| topic | Machine Learning Survival analysis Statistical methods Return to work Socioeconomic deprivation |
| url | https://doi.org/10.1186/s12874-024-02390-4 |
| work_keys_str_mv | AT georgebouliotis predictingthetimetogetbacktoworkusingstatisticalmodelsandmachinelearningapproaches AT munderwood predictingthetimetogetbacktoworkusingstatisticalmodelsandmachinelearningapproaches AT rfroud predictingthetimetogetbacktoworkusingstatisticalmodelsandmachinelearningapproaches |