Statistical learning to identify and characterise neurodevelopmental outcomes at 2 years in babies born preterm: model development and validation using population-level data from England and WalesResearch in context

Summary: Background: Children born preterm face elevated risks of neurodevelopmental impairments across domains. Prior studies have relied on expert-imposed typologies within single domains. This study applies statistical learning to a national database to identify transdomain clusters and their ma...

Full description

Saved in:
Bibliographic Details
Main Authors: Sadia Haider, Athanasios Tsanas, G. David Batty, Rebecca M. Reynolds, Heather C. Whalley, Simon R. Cox, Riccardo E. Marioni, Cheryl Battersby, James P. Boardman
Format: Article
Language:English
Published: Elsevier 2025-07-01
Series:EBioMedicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352396425002555
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary: Background: Children born preterm face elevated risks of neurodevelopmental impairments across domains. Prior studies have relied on expert-imposed typologies within single domains. This study applies statistical learning to a national database to identify transdomain clusters and their maternal and neonatal predictors. Methods: Latent class analysis (LCA) was used to derive transdomain clusters from parent-reported visual, auditory, neuromotor, and communication impairments in preterm-born children at two years corrected age using the UK National Neonatal Research Database data (N = 27,261). Replication was conducted in an independent sample from Wales (N = 975). Clusters were clinically validated using cerebral palsy diagnosis, Bayley Scales of Infant and Toddler Development (3rd edition), and global neurodevelopmental delay. Random forest identified cluster-specific and shared predictors. Findings: Four homogeneous clusters were derived (silhouette score = 0.71) and replicated in Wales with high balanced accuracy (93%): (1) typically developing (84.8%), (2) communication impairments (8.4%), (3) neuro-motor impairments (4.1%), and (4) multiple neuro-morbidity (2.7%). Clusters had high clinical validity and were distinguishable by shared and cluster-specific predictors. Neonatal brain injuries were most predictive of neuro-motor and multiple neuro-morbidity clusters. Birthweight, gestational age, socio-economic deprivation, and sex were stronger predictors of the communication cluster than preterm co-morbidities. Interpretation: This study provides first evidence of the transdomain nature of neurodevelopmental impairments after preterm birth using LCA. The finding that socio-demographic and perinatal factors rather than co-morbidities increase the risk of communication impairment highlights the importance of environmental modification alongside clinical interventions. Applying data-driven approaches to routinely collected data may offer a cost-effective way to stratify at-risk children and inform targeted support strategies. Funding: UKRI Medical Research Council.
ISSN:2352-3964