Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.

<h4>Objectives</h4>Approximately 80% of people with epilepsy live in low- and middle-income countries (LMICs), where limited resources and stigma hinder accurate diagnosis and treatment. Clinical machine learning models have demonstrated substantial promise in supporting the diagnostic p...

Full description

Saved in:
Bibliographic Details
Main Authors: Ioana Duta, Symon M Kariuki, Anthony K Ngugi, Angelina Kakooza Mwesige, Honorati Masanja, Daniel M Mwanga, Seth Owusu-Agyei, Ryan Wagner, J Helen Cross, Josemir W Sander, Charles R Newton, Arjune Sen, Gabriel Davis Jones
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-02-01
Series:PLOS Digital Health
Online Access:https://doi.org/10.1371/journal.pdig.0000491
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850155305001811968
author Ioana Duta
Symon M Kariuki
Anthony K Ngugi
Angelina Kakooza Mwesige
Honorati Masanja
Daniel M Mwanga
Seth Owusu-Agyei
Ryan Wagner
J Helen Cross
Josemir W Sander
Charles R Newton
Arjune Sen
Gabriel Davis Jones
author_facet Ioana Duta
Symon M Kariuki
Anthony K Ngugi
Angelina Kakooza Mwesige
Honorati Masanja
Daniel M Mwanga
Seth Owusu-Agyei
Ryan Wagner
J Helen Cross
Josemir W Sander
Charles R Newton
Arjune Sen
Gabriel Davis Jones
author_sort Ioana Duta
collection DOAJ
description <h4>Objectives</h4>Approximately 80% of people with epilepsy live in low- and middle-income countries (LMICs), where limited resources and stigma hinder accurate diagnosis and treatment. Clinical machine learning models have demonstrated substantial promise in supporting the diagnostic process in LMICs by aiding in preliminary screening and detection of possible epilepsy cases without relying on specialised or trained personnel. How well these models generalise to naïve regions is, however, underexplored. Here, we use a novel approach to assess the suitability and applicability of such clinical tools to aid screening and diagnosis of active convulsive epilepsy in settings beyond their original training contexts.<h4>Methods</h4>We sourced data from the Study of Epidemiology of Epilepsy in Demographic Sites dataset, which includes demographic information and clinical variables related to diagnosing epilepsy across five sub-Saharan African sites. For each site, we developed a region-specific (single-site) predictive model for epilepsy and assessed its performance at other sites. We then iteratively added sites to a multi-site model and evaluated model performance on the omitted regions. Model performances and parameters were then compared across every permutation of sites. We used a leave-one-site-out cross-validation analysis to assess the impact of incorporating individual site data in the model.<h4>Results</h4>Single-site clinical models performed well within their own regions, but generally worse when evaluated in other regions (p<0.05). Model weights and optimal thresholds varied markedly across sites. When the models were trained using data from an increasing number of sites, mean internal performance decreased while external performance improved.<h4>Conclusions</h4>Clinical models for epilepsy diagnosis in LMICs demonstrate characteristic traits of ML models, such as limited generalisability and a trade-off between internal and external performance. The relationship between predictors and model outcomes also varies across sites, suggesting the need to update specific model aspects with local data before broader implementation. Variations are likely to be particular to the cultural context of diagnosis. We recommend developing models adapted to the cultures and contexts of their intended deployment and caution against deploying region- and culture-naïve models without thorough prior evaluation.
format Article
id doaj-art-cff7c000de1f4e7287cf7a616eb666db
institution OA Journals
issn 2767-3170
language English
publishDate 2025-02-01
publisher Public Library of Science (PLoS)
record_format Article
series PLOS Digital Health
spelling doaj-art-cff7c000de1f4e7287cf7a616eb666db2025-08-20T02:24:58ZengPublic Library of Science (PLoS)PLOS Digital Health2767-31702025-02-0142e000049110.1371/journal.pdig.0000491Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.Ioana DutaSymon M KariukiAnthony K NgugiAngelina Kakooza MwesigeHonorati MasanjaDaniel M MwangaSeth Owusu-AgyeiRyan WagnerJ Helen CrossJosemir W SanderCharles R NewtonArjune SenGabriel Davis Jones<h4>Objectives</h4>Approximately 80% of people with epilepsy live in low- and middle-income countries (LMICs), where limited resources and stigma hinder accurate diagnosis and treatment. Clinical machine learning models have demonstrated substantial promise in supporting the diagnostic process in LMICs by aiding in preliminary screening and detection of possible epilepsy cases without relying on specialised or trained personnel. How well these models generalise to naïve regions is, however, underexplored. Here, we use a novel approach to assess the suitability and applicability of such clinical tools to aid screening and diagnosis of active convulsive epilepsy in settings beyond their original training contexts.<h4>Methods</h4>We sourced data from the Study of Epidemiology of Epilepsy in Demographic Sites dataset, which includes demographic information and clinical variables related to diagnosing epilepsy across five sub-Saharan African sites. For each site, we developed a region-specific (single-site) predictive model for epilepsy and assessed its performance at other sites. We then iteratively added sites to a multi-site model and evaluated model performance on the omitted regions. Model performances and parameters were then compared across every permutation of sites. We used a leave-one-site-out cross-validation analysis to assess the impact of incorporating individual site data in the model.<h4>Results</h4>Single-site clinical models performed well within their own regions, but generally worse when evaluated in other regions (p<0.05). Model weights and optimal thresholds varied markedly across sites. When the models were trained using data from an increasing number of sites, mean internal performance decreased while external performance improved.<h4>Conclusions</h4>Clinical models for epilepsy diagnosis in LMICs demonstrate characteristic traits of ML models, such as limited generalisability and a trade-off between internal and external performance. The relationship between predictors and model outcomes also varies across sites, suggesting the need to update specific model aspects with local data before broader implementation. Variations are likely to be particular to the cultural context of diagnosis. We recommend developing models adapted to the cultures and contexts of their intended deployment and caution against deploying region- and culture-naïve models without thorough prior evaluation.https://doi.org/10.1371/journal.pdig.0000491
spellingShingle Ioana Duta
Symon M Kariuki
Anthony K Ngugi
Angelina Kakooza Mwesige
Honorati Masanja
Daniel M Mwanga
Seth Owusu-Agyei
Ryan Wagner
J Helen Cross
Josemir W Sander
Charles R Newton
Arjune Sen
Gabriel Davis Jones
Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
PLOS Digital Health
title Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
title_full Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
title_fullStr Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
title_full_unstemmed Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
title_short Evaluating the generalisability of region-naïve machine learning algorithms for the identification of epilepsy in low-resource settings.
title_sort evaluating the generalisability of region naive machine learning algorithms for the identification of epilepsy in low resource settings
url https://doi.org/10.1371/journal.pdig.0000491
work_keys_str_mv AT ioanaduta evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT symonmkariuki evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT anthonykngugi evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT angelinakakoozamwesige evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT honoratimasanja evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT danielmmwanga evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT sethowusuagyei evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT ryanwagner evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT jhelencross evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT josemirwsander evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT charlesrnewton evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT arjunesen evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings
AT gabrieldavisjones evaluatingthegeneralisabilityofregionnaivemachinelearningalgorithmsfortheidentificationofepilepsyinlowresourcesettings