Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach

Abstract Aims Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort o...

Full description

Saved in:
Bibliographic Details
Main Authors: Itxaso Alayo, Oriol Pujol, Jordi Alonso, Montse Ferrer, Franco Amigo, Ana Portillo-Van Diest, Enric Aragonès, Andrés Aragon Peña, Ángel Asúnsolo Del Barco, Mireia Campos, Meritxell Espuga, Ana González-Pinto, Josep Maria Haro, Nieves López-Fresneña, Alma D. Martínez de Salázar, Juan D. Molina, Rafael M. Ortí-Lucas, Mara Parellada, José Maria Pelayo-Terán, Maria João Forjaz, Aurora Pérez-Zapata, José Ignacio Pijoan, Nieves Plana, Elena Polentinos-Castro, Maria Teresa Puig, Cristina Rius, Ferran Sanz, Cònsol Serra, Iratxe Urreta-Barallobre, Ronny Bruffaerts, Eduard Vieta, Víctor Pérez-Solá, Philippe Mortier, Gemma Vilagut
Format: Article
Language:English
Published: Cambridge University Press 2025-01-01
Series:Epidemiology and Psychiatric Sciences
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S2045796025000198/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Aims Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort of Spanish HCWs and (2) identify the most important variables in terms of contribution to the model’s predictive accuracy. Methods This is a prospective, multicentre cohort study of Spanish HCWs active during the COVID-19 pandemic. A total of 8,996 HCWs participated in the web-based baseline survey (May–July 2020) and 4,809 in the 4-month follow-up survey. A total of 219 predictor variables were derived from the baseline survey. The outcome variable was any STB at the 4-month follow-up. Variable selection was done using an L1 regularized linear Support Vector Classifier (SVC). A random forest model with 5-fold cross-validation was developed, in which the Synthetic Minority Oversampling Technique (SMOTE) and undersampling of the majority class balancing techniques were tested. The model was evaluated by the area under the Receiver Operating Characteristic (AUROC) curve and the area under the precision–recall curve. Shapley’s additive explanatory values (SHAP values) were used to evaluate the overall contribution of each variable to the prediction of future STBs. Results were obtained separately by gender. Results The prevalence of STBs in HCWs at the 4-month follow-up was 7.9% (women = 7.8%, men = 8.2%). Thirty-four variables were selected by the L1 regularized linear SVC. The best results were obtained without data balancing techniques: AUROC = 0.87 (0.86 for women and 0.87 for men) and area under the precision–recall curve = 0.50 (0.55 for women and 0.45 for men). Based on SHAP values, the most important baseline predictors for any STB at the 4-month follow-up were the presence of passive suicidal ideation, the number of days in the past 30 days with passive or active suicidal ideation, the number of days in the past 30 days with binge eating episodes, the number of panic attacks (women only) and the frequency of intrusive thoughts (men only). Conclusions Machine learning-based prediction models for STBs in HCWs during the COVID-19 pandemic trained on web-based survey data present high discrimination and classification capacity. Future clinical implementations of this model could enable the early detection of HCWs at the highest risk for developing adverse mental health outcomes. Study registration NCT04556565
ISSN:2045-7960
2045-7979