GenECG: a synthetic image-based ECG dataset to augment artificial intelligence-enhanced algorithm development

Objectives An image-based ECG dataset incorporating visual imperfections common to paper-based ECGs, which are typically scanned or photographed into electronic health records, could facilitate clinically useful artificial intelligence (AI)-ECG algorithm development. This study aimed to create a hig...

Full description

Saved in:
Bibliographic Details
Main Authors: Neil Bodagh, Steven Niederer, Mark O’Neill, Rachel Burns, Darwon Rashid, Steven E Williams, Miguel O Bernabeu, Ali Gharaviri, Vinush Vigneswaran, Magda Klis, Irum Kotadia, Malihe Javidi, Kyaw Soe Tun, Adam Barton
Format: Article
Language:English
Published: BMJ Publishing Group 2025-05-01
Series:BMJ Health & Care Informatics
Online Access:https://informatics.bmj.com/content/32/1/e101335.full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objectives An image-based ECG dataset incorporating visual imperfections common to paper-based ECGs, which are typically scanned or photographed into electronic health records, could facilitate clinically useful artificial intelligence (AI)-ECG algorithm development. This study aimed to create a high-fidelity, synthetic image-based ECG dataset.Methods ECG images were recreated from the PTB-XL database, a signal-based dataset and image manipulation techniques were applied to mimic imperfections associated with ECGs in real-world settings. Clinical Turing tests were conducted to evaluate the fidelity of the synthetic images, and the performance of current AI-ECG algorithms was assessed using synthetic images containing visual imperfections.Results GenECG, an image-based dataset containing 21 799 ECGs with visual imperfections encountered in routine clinical care paired with imperfection-free images, was created. Turing tests confirmed the realism of the images: expert observer accuracy of discrimination between real-world and synthetic ECGs fell from 63.9% (95% CI 58.0% to 69.8%) to 53.3% (95% CI 48.6% to 58.1%) over three rounds of testing, indicating that observers could not distinguish between synthetic and real ECGs. The performance of pre-existing algorithms on synthetic (area under the curve (AUC) 0.592, 95% CI 0.421 to 0.763) and real-world (AUC 0.647, 95% CI 0.520 to 0.774) ECG images containing imperfections was limited. Algorithm fine-tuning with GenECG data improved real-world ECG classification accuracy (AUC 0.821, 95% CI 0.730 to 0.913) demonstrating its potential to augment image-based algorithm development.Discussion/conclusion GenECG is the first synthetic image-based ECG dataset to pass a clinical Turing test. The dataset will enable image-based AI-ECG algorithm development, ensuring utility in low resource areas, prehospital settings and hospital environments where signal data are unavailable.
ISSN:2632-1009