Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study

Abstract BackgroundSystematic reviews are essential for synthesizing research in health sciences; however, they are resource-intensive and prone to human error. The data extraction phase, in which key details of studies are identified and recorded in a systematic manner, may b...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jayden Sercombe, Zachary Bryant, Jack Wilson
Format:	Article
Language:	English
Published:	JMIR Publications 2025-08-01
Series:	JMIR Formative Research
Online Access:	https://formative.jmir.org/2025/1/e68666
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849736789935259648
author	Jayden Sercombe Zachary Bryant Jack Wilson
author_facet	Jayden Sercombe Zachary Bryant Jack Wilson
author_sort	Jayden Sercombe
collection	DOAJ
description	Abstract BackgroundSystematic reviews are essential for synthesizing research in health sciences; however, they are resource-intensive and prone to human error. The data extraction phase, in which key details of studies are identified and recorded in a systematic manner, may benefit from the application of automation processes. Recent advancements in artificial intelligence, specifically in large language models (LLMs) such as ChatGPT, may streamline this process. ObjectiveThis study aimed to develop and evaluate a custom Generative Pre-Training Transformer (GPT), named Systematic Review Extractor Pro MethodsOpenAI’s GPT Builder was used to create a GPT tailored to extract information from academic manuscripts. The Role, Instruction, Steps, End goal, and Narrowing (RISEN) framework was used to inform prompt engineering for the GPT. A sample of 20 studies from two distinct systematic reviews was used to evaluate the GPT’s performance in extraction. Agreement rates between the GPT outputs and human reviewers were calculated for each study subsection. ResultsThe mean time for human data extraction was 36 minutes per study, compared to 26.6 seconds for GPT generation, followed by 13 minutes of human review. The GPT demonstrated high overall agreement rates with human reviewers, achieving 91.45% for review 1 and 89.31% for review 2. It was particularly accurate in extracting study characteristics (review 1: 95.25%; review 2: 90.83%) and participant characteristics (review 1: 95.03%; review 2: 90.00%), with lower performance observed in more complex areas such as methodological characteristics (87.07%) and statistical results (77.50%). The GPT correctly extracted data in 14 instances (3.25% in review 1) and four instances (1.16% in review 2) when the human reviewer was incorrect. ConclusionsThe custom GPT significantly reduced extraction time and shows evidence that it can extract data with high accuracy, particularly for participant and study characteristics. This tool may offer a viable option for researchers seeking to reduce resource demands during the extraction phase, although more research is needed to evaluate test-retest reliability, performance across broader review types, and accuracy in extracting statistical data. The tool developed in the current study has been made open access.
format	Article
id	doaj-art-e9c5861da492486fba8b2f78392e8704
institution	DOAJ
issn	2561-326X
language	English
publishDate	2025-08-01
publisher	JMIR Publications
record_format	Article
series	JMIR Formative Research
spelling	doaj-art-e9c5861da492486fba8b2f78392e87042025-08-20T03:07:10ZengJMIR PublicationsJMIR Formative Research2561-326X2025-08-019e68666e6866610.2196/68666Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability StudyJayden Sercombehttp://orcid.org/0000-0002-9051-0340Zachary Bryanthttp://orcid.org/0000-0002-2115-1516Jack Wilsonhttp://orcid.org/0000-0002-2732-1731 Abstract BackgroundSystematic reviews are essential for synthesizing research in health sciences; however, they are resource-intensive and prone to human error. The data extraction phase, in which key details of studies are identified and recorded in a systematic manner, may benefit from the application of automation processes. Recent advancements in artificial intelligence, specifically in large language models (LLMs) such as ChatGPT, may streamline this process. ObjectiveThis study aimed to develop and evaluate a custom Generative Pre-Training Transformer (GPT), named Systematic Review Extractor Pro MethodsOpenAI’s GPT Builder was used to create a GPT tailored to extract information from academic manuscripts. The Role, Instruction, Steps, End goal, and Narrowing (RISEN) framework was used to inform prompt engineering for the GPT. A sample of 20 studies from two distinct systematic reviews was used to evaluate the GPT’s performance in extraction. Agreement rates between the GPT outputs and human reviewers were calculated for each study subsection. ResultsThe mean time for human data extraction was 36 minutes per study, compared to 26.6 seconds for GPT generation, followed by 13 minutes of human review. The GPT demonstrated high overall agreement rates with human reviewers, achieving 91.45% for review 1 and 89.31% for review 2. It was particularly accurate in extracting study characteristics (review 1: 95.25%; review 2: 90.83%) and participant characteristics (review 1: 95.03%; review 2: 90.00%), with lower performance observed in more complex areas such as methodological characteristics (87.07%) and statistical results (77.50%). The GPT correctly extracted data in 14 instances (3.25% in review 1) and four instances (1.16% in review 2) when the human reviewer was incorrect. ConclusionsThe custom GPT significantly reduced extraction time and shows evidence that it can extract data with high accuracy, particularly for participant and study characteristics. This tool may offer a viable option for researchers seeking to reduce resource demands during the extraction phase, although more research is needed to evaluate test-retest reliability, performance across broader review types, and accuracy in extracting statistical data. The tool developed in the current study has been made open access.https://formative.jmir.org/2025/1/e68666
spellingShingle	Jayden Sercombe Zachary Bryant Jack Wilson Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study JMIR Formative Research
title	Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study
title_full	Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study
title_fullStr	Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study
title_full_unstemmed	Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study
title_short	Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study
title_sort	evaluating a customized version of chatgpt for systematic review data extraction in health research development and usability study
url	https://formative.jmir.org/2025/1/e68666
work_keys_str_mv	AT jaydensercombe evaluatingacustomizedversionofchatgptforsystematicreviewdataextractioninhealthresearchdevelopmentandusabilitystudy AT zacharybryant evaluatingacustomizedversionofchatgptforsystematicreviewdataextractioninhealthresearchdevelopmentandusabilitystudy AT jackwilson evaluatingacustomizedversionofchatgptforsystematicreviewdataextractioninhealthresearchdevelopmentandusabilitystudy

Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study

Similar Items