A comparative study of screening performance between abstrackr and GPT models: Systematic review and contextual analysis

Abstract Background Systematic reviews (SRs) and rapid reviews (RRs) are critical methodologies for synthesizing existing research evidence. However, the growing volume of literature has made the process of screening studies one of the most challenging steps in conducting systematic reviews. Methods...

Full description

Saved in:
Bibliographic Details
Main Authors: Sheyang Xu, Zhiheng Zhao, Xingling Liu, Xiang-long Meng
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Medical Informatics and Decision Making
Online Access:https://doi.org/10.1186/s12911-025-03138-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Systematic reviews (SRs) and rapid reviews (RRs) are critical methodologies for synthesizing existing research evidence. However, the growing volume of literature has made the process of screening studies one of the most challenging steps in conducting systematic reviews. Methods This systematic review aimed to compare the performance of Abstrackr and GPT models (including GPT-3.5 and GPT-4) in literature screening for systematic reviews. We identified relevant studies through comprehensive searches in PubMed, Cochrane Library, and Web of Science, focusing on those that provided key performance metrics such as recall, precision, specificity, and F1 score. Results GPT models demonstrated superior performance compared to Abstrackr in precision (0.51 vs. 0.21), specificity (0.84 vs. 0.71), and F1 score (0.52 vs. 0.31), reflecting a higher overall efficiency and better balance in screening. This makes GPT models particularly effective in reducing false positives during fine-screening tasks. Conclusion Abstrackr and GPT models each offer distinct advantages in literature screening. Abstrackr is more suitable for the initial screening phases, whereas GPT models excel in fine-screening tasks. To optimize the efficiency and accuracy of systematic reviews, future screening tools could integrate the strengths of both models, potentially leading to the development of hybrid systems tailored to different stages of the screening process.
ISSN:1472-6947