Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study

Background: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associate...

Full description

Saved in:
Bibliographic Details
Main Authors: Alice Schiavone, Lea Marie Pehrson, Silvia Ingala, Rasmus Bonnevie, Marco Fraccaro, Dana Li, Michael Bachmann Nielsen, Desmond Elliott
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:AI
Subjects:
Online Access:https://www.mdpi.com/2673-2688/6/2/37
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850081643860066304
author Alice Schiavone
Lea Marie Pehrson
Silvia Ingala
Rasmus Bonnevie
Marco Fraccaro
Dana Li
Michael Bachmann Nielsen
Desmond Elliott
author_facet Alice Schiavone
Lea Marie Pehrson
Silvia Ingala
Rasmus Bonnevie
Marco Fraccaro
Dana Li
Michael Bachmann Nielsen
Desmond Elliott
author_sort Alice Schiavone
collection DOAJ
description Background: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associated with chest X-rays. Methods: This study investigated techniques to extract 49 labels in a hierarchical tree structure from chest X-ray reports written in Danish. The labels were extracted from approximately 550,000 reports by performing multi-class, multi-label classification using a method based on pattern-matching rules, a classic approach in the literature for solving this task. The performance of this method was compared to that of open-source large language models that were pre-trained on Danish data and fine-tuned for classification. Results: Methods developed for English were also applicable to Danish and achieved similar performance (a weighted F1 score of 0.778 on 49 findings). A small set of expert annotations was sufficient to achieve competitive results, even with an unbalanced dataset. Conclusions: Natural language processing techniques provide a promising alternative to human expert annotation when annotations of chest X-ray reports are needed. Large language models can outperform traditional pattern-matching methods.
format Article
id doaj-art-4ebb87a9db0a494196e201ac83905224
institution DOAJ
issn 2673-2688
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series AI
spelling doaj-art-4ebb87a9db0a494196e201ac839052242025-08-20T02:44:40ZengMDPI AGAI2673-26882025-02-01623710.3390/ai6020037Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case StudyAlice Schiavone0Lea Marie Pehrson1Silvia Ingala2Rasmus Bonnevie3Marco Fraccaro4Dana Li5Michael Bachmann Nielsen6Desmond Elliott7Department of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital Rigshospitalet, 2100 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkBackground: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associated with chest X-rays. Methods: This study investigated techniques to extract 49 labels in a hierarchical tree structure from chest X-ray reports written in Danish. The labels were extracted from approximately 550,000 reports by performing multi-class, multi-label classification using a method based on pattern-matching rules, a classic approach in the literature for solving this task. The performance of this method was compared to that of open-source large language models that were pre-trained on Danish data and fine-tuned for classification. Results: Methods developed for English were also applicable to Danish and achieved similar performance (a weighted F1 score of 0.778 on 49 findings). A small set of expert annotations was sufficient to achieve competitive results, even with an unbalanced dataset. Conclusions: Natural language processing techniques provide a promising alternative to human expert annotation when annotations of chest X-ray reports are needed. Large language models can outperform traditional pattern-matching methods.https://www.mdpi.com/2673-2688/6/2/37AI for healthcarenatural language processingradiology report classification
spellingShingle Alice Schiavone
Lea Marie Pehrson
Silvia Ingala
Rasmus Bonnevie
Marco Fraccaro
Dana Li
Michael Bachmann Nielsen
Desmond Elliott
Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
AI
AI for healthcare
natural language processing
radiology report classification
title Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
title_full Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
title_fullStr Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
title_full_unstemmed Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
title_short Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
title_sort effective machine learning techniques for non english radiology report classification a danish case study
topic AI for healthcare
natural language processing
radiology report classification
url https://www.mdpi.com/2673-2688/6/2/37
work_keys_str_mv AT aliceschiavone effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT leamariepehrson effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT silviaingala effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT rasmusbonnevie effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT marcofraccaro effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT danali effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT michaelbachmannnielsen effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy
AT desmondelliott effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy