Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning

Background As part of our ongoing systematic review of complex interventions for the primary prevention of cardiovascular diseases, we have developed and evaluated automated machine-learning classifiers for title and abstract screening. The aim was to develop a high-performing algorithm comparable t...

Full description

Saved in:
Bibliographic Details
Main Authors: Olalekan A Uthman, Rachel Court, Jodie Enderby, Lena Al-Khudairy, Chidozie Nduka, Hema Mistry, GJ Melendez-Torres, Sian Taylor-Phillips, Aileen Clarke
Format: Article
Language:English
Published: NIHR Journals Library 2022-11-01
Series:Health Technology Assessment
Subjects:
Online Access:https://doi.org/10.3310/UDIR6682
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849340397788069888
author Olalekan A Uthman
Rachel Court
Jodie Enderby
Lena Al-Khudairy
Chidozie Nduka
Hema Mistry
GJ Melendez-Torres
Sian Taylor-Phillips
Aileen Clarke
author_facet Olalekan A Uthman
Rachel Court
Jodie Enderby
Lena Al-Khudairy
Chidozie Nduka
Hema Mistry
GJ Melendez-Torres
Sian Taylor-Phillips
Aileen Clarke
author_sort Olalekan A Uthman
collection DOAJ
description Background As part of our ongoing systematic review of complex interventions for the primary prevention of cardiovascular diseases, we have developed and evaluated automated machine-learning classifiers for title and abstract screening. The aim was to develop a high-performing algorithm comparable to human screening. Methods We followed a three-phase process to develop and test an automated machine learning-based classifier for screening potential studies on interventions for primary prevention of cardiovascular disease. We labelled a total of 16,611 articles during the first phase of the project. In the second phase, we used the labelled articles to develop a machine learning-based classifier. After that, we examined the performance of the classifiers in correctly labelling the papers. We evaluated the performance of the five deep-learning models [i.e. parallel convolutional neural network (CNN), stacked CNN, parallel-stacked CNN, recurrent neural network (RNN) and CNN–RNN]. The models were evaluated using recall, precision and work saved over sampling at no less than 95% recall. Results We labelled a total of 16,611 articles, of which 676 (4.0%) were tagged as ‘relevant’ and 15,935 (96%) were tagged as ‘irrelevant’. The recall ranged from 51.9% to 96.6%. The precision ranged from 64.6% to 99.1%. The work saved over sampling ranged from 8.9% to as high as 92.1%. The best-performing model was parallel CNN, yielding a 96.4% recall, as well as 99.1% precision, and a potential workload reduction of 89.9%. Future work and limitations We used words from the title and the abstract only. More work needs to be done to look into possible changes in performance, such as adding features such as full document text. The approach might also not be able to be used for other complex systematic reviews on different topics. Conclusion Our study shows that machine learning has the potential to significantly aid the labour-intensive screening of abstracts in systematic reviews of complex interventions. Future research should concentrate on enhancing the classifier system and determining how it can be integrated into the systematic review workflow. Funding This project was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme and will be published in Health Technology Assessment. See the NIHR Journals Library website for further project information.
format Article
id doaj-art-a0b261f9584e40ed81201d50b18be74f
institution Kabale University
issn 2046-4924
language English
publishDate 2022-11-01
publisher NIHR Journals Library
record_format Article
series Health Technology Assessment
spelling doaj-art-a0b261f9584e40ed81201d50b18be74f2025-08-20T03:43:55ZengNIHR Journals LibraryHealth Technology Assessment2046-49242022-11-01293710.3310/UDIR6682NIHR135482Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learningOlalekan A Uthman0Rachel Court1Jodie Enderby2Lena Al-Khudairy3Chidozie Nduka4Hema Mistry5GJ Melendez-Torres6Sian Taylor-Phillips7Aileen Clarke8Warwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKPeninsula Technology Assessment Group (PenTAG), College of Medicine and Health, University of Exeter, Exeter, UKWarwick Medical School, University of Warwick, Coventry, UKWarwick Medical School, University of Warwick, Coventry, UKBackground As part of our ongoing systematic review of complex interventions for the primary prevention of cardiovascular diseases, we have developed and evaluated automated machine-learning classifiers for title and abstract screening. The aim was to develop a high-performing algorithm comparable to human screening. Methods We followed a three-phase process to develop and test an automated machine learning-based classifier for screening potential studies on interventions for primary prevention of cardiovascular disease. We labelled a total of 16,611 articles during the first phase of the project. In the second phase, we used the labelled articles to develop a machine learning-based classifier. After that, we examined the performance of the classifiers in correctly labelling the papers. We evaluated the performance of the five deep-learning models [i.e. parallel convolutional neural network (CNN), stacked CNN, parallel-stacked CNN, recurrent neural network (RNN) and CNN–RNN]. The models were evaluated using recall, precision and work saved over sampling at no less than 95% recall. Results We labelled a total of 16,611 articles, of which 676 (4.0%) were tagged as ‘relevant’ and 15,935 (96%) were tagged as ‘irrelevant’. The recall ranged from 51.9% to 96.6%. The precision ranged from 64.6% to 99.1%. The work saved over sampling ranged from 8.9% to as high as 92.1%. The best-performing model was parallel CNN, yielding a 96.4% recall, as well as 99.1% precision, and a potential workload reduction of 89.9%. Future work and limitations We used words from the title and the abstract only. More work needs to be done to look into possible changes in performance, such as adding features such as full document text. The approach might also not be able to be used for other complex systematic reviews on different topics. Conclusion Our study shows that machine learning has the potential to significantly aid the labour-intensive screening of abstracts in systematic reviews of complex interventions. Future research should concentrate on enhancing the classifier system and determining how it can be integrated into the systematic review workflow. Funding This project was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme and will be published in Health Technology Assessment. See the NIHR Journals Library website for further project information.https://doi.org/10.3310/UDIR6682text classificationreducing workloadmachine learning
spellingShingle Olalekan A Uthman
Rachel Court
Jodie Enderby
Lena Al-Khudairy
Chidozie Nduka
Hema Mistry
GJ Melendez-Torres
Sian Taylor-Phillips
Aileen Clarke
Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
Health Technology Assessment
text classification
reducing workload
machine learning
title Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
title_full Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
title_fullStr Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
title_full_unstemmed Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
title_short Increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
title_sort increasing comprehensiveness and reducing workload in a systematic review of complex interventions using automated machine learning
topic text classification
reducing workload
machine learning
url https://doi.org/10.3310/UDIR6682
work_keys_str_mv AT olalekanauthman increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT rachelcourt increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT jodieenderby increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT lenaalkhudairy increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT chidozienduka increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT hemamistry increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT gjmelendeztorres increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT siantaylorphillips increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning
AT aileenclarke increasingcomprehensivenessandreducingworkloadinasystematicreviewofcomplexinterventionsusingautomatedmachinelearning