ATCodeR: a dictionary-based R-tool to standardize medication free-text

Abstract Over the past decades, oncology treatment paradigms have developed significantly. Yet, the often unstructured nature of substance-related documentation in medical records presents a time-consuming challenge for analyzing treatment patterns and outcomes. To advance oncological research furth...

Full description

Saved in:
Bibliographic Details
Main Authors: Isabel Schnorr, Stefanie Andreas, Linnea Schumann, Svenja Hahn, Jörg Janne Vehreschild, Daniel Maier
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-97150-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Over the past decades, oncology treatment paradigms have developed significantly. Yet, the often unstructured nature of substance-related documentation in medical records presents a time-consuming challenge for analyzing treatment patterns and outcomes. To advance oncological research further, clinical data science must offer solutions that facilitate research and analysis with real-world data (RWD). The present contribution introduces a user-friendly R-tool designed to transform free-text medication entries into the structured Anatomical Therapeutic Chemical (ATC) Classification System by applying a dictionary-based approach. The resulting output is a structured data frame containing columns for antineoplastic medication, other medications, and supplementary information. For accuracy validation, 561 data entries from an evaluation data set were reviewed, consisting of 935 tokens. 88.5% of these tokens were successfully transformed into their respective ATC codes. Additional information was extracted from 129 data entries (23%), while 23 entries (4.1%) presented no usable information. All tokens underwent a manual review; 8.9% (84 tokens) failed transformations. This approach improves the standardization and analysis of systemic anti-cancer treatment data in German-speaking regions by optimizing efficiency while maintaining relevant accuracy.
ISSN:2045-2322