A cost-effective approach to counterbalance the scarcity of medical datasets

This paper presents an innovative methodology for addressing the critical issue of data scarcity in clinical research, specifically within emergency departments. Inspired by the recent advancements in the generative abilities of Large Language Models (LLMs), we devised an automated approach based on...

Full description

Saved in:
Bibliographic Details
Main Authors: Bernardo Magnini, Saeed Farzi, Pietro Ferrazzi, Soumitra Ghosh, Alberto Lavelli, Giulia Mezzanotte, Manuela Speranza
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-05-01
Series:Frontiers in Disaster and Emergency Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/femer.2025.1558200/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an innovative methodology for addressing the critical issue of data scarcity in clinical research, specifically within emergency departments. Inspired by the recent advancements in the generative abilities of Large Language Models (LLMs), we devised an automated approach based on LLMs to extend an existing publicly available English dataset to new languages. We constructed a pipeline of multiple automated components which first converts an existing annotated dataset from its complex standard format to a simpler inline annotated format, then generates inline annotations in the target language using LLMs, and finally converts the generated target language inline annotations to the dataset's standard format; a manual validation is envisaged for erroneous and missing annotations. By automating the translation and annotation transfer process, the method we propose significantly reduces the resource-intensive task of collecting data and manually annotating them, thus representing a crucial step toward bridging the gap between the need for clinical research and the availability of high-quality data.
ISSN:2813-7302