Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study
Background: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associate...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | AI |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-2688/6/2/37 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850081643860066304 |
|---|---|
| author | Alice Schiavone Lea Marie Pehrson Silvia Ingala Rasmus Bonnevie Marco Fraccaro Dana Li Michael Bachmann Nielsen Desmond Elliott |
| author_facet | Alice Schiavone Lea Marie Pehrson Silvia Ingala Rasmus Bonnevie Marco Fraccaro Dana Li Michael Bachmann Nielsen Desmond Elliott |
| author_sort | Alice Schiavone |
| collection | DOAJ |
| description | Background: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associated with chest X-rays. Methods: This study investigated techniques to extract 49 labels in a hierarchical tree structure from chest X-ray reports written in Danish. The labels were extracted from approximately 550,000 reports by performing multi-class, multi-label classification using a method based on pattern-matching rules, a classic approach in the literature for solving this task. The performance of this method was compared to that of open-source large language models that were pre-trained on Danish data and fine-tuned for classification. Results: Methods developed for English were also applicable to Danish and achieved similar performance (a weighted F1 score of 0.778 on 49 findings). A small set of expert annotations was sufficient to achieve competitive results, even with an unbalanced dataset. Conclusions: Natural language processing techniques provide a promising alternative to human expert annotation when annotations of chest X-ray reports are needed. Large language models can outperform traditional pattern-matching methods. |
| format | Article |
| id | doaj-art-4ebb87a9db0a494196e201ac83905224 |
| institution | DOAJ |
| issn | 2673-2688 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | AI |
| spelling | doaj-art-4ebb87a9db0a494196e201ac839052242025-08-20T02:44:40ZengMDPI AGAI2673-26882025-02-01623710.3390/ai6020037Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case StudyAlice Schiavone0Lea Marie Pehrson1Silvia Ingala2Rasmus Bonnevie3Marco Fraccaro4Dana Li5Michael Bachmann Nielsen6Desmond Elliott7Department of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital Rigshospitalet, 2100 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkUnumed Aps, 1055 Copenhagen, DenmarkDepartment of Diagnostic Radiology, Copenhagen University Hospital Rigshospitalet, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkDepartment of Computer Science, University of Copenhagen, 2100 Copenhagen, DenmarkBackground: Machine learning methods for clinical assistance require a large number of annotations from trained experts to achieve optimal performance. Previous work in natural language processing has shown that it is possible to automatically extract annotations from the free-text reports associated with chest X-rays. Methods: This study investigated techniques to extract 49 labels in a hierarchical tree structure from chest X-ray reports written in Danish. The labels were extracted from approximately 550,000 reports by performing multi-class, multi-label classification using a method based on pattern-matching rules, a classic approach in the literature for solving this task. The performance of this method was compared to that of open-source large language models that were pre-trained on Danish data and fine-tuned for classification. Results: Methods developed for English were also applicable to Danish and achieved similar performance (a weighted F1 score of 0.778 on 49 findings). A small set of expert annotations was sufficient to achieve competitive results, even with an unbalanced dataset. Conclusions: Natural language processing techniques provide a promising alternative to human expert annotation when annotations of chest X-ray reports are needed. Large language models can outperform traditional pattern-matching methods.https://www.mdpi.com/2673-2688/6/2/37AI for healthcarenatural language processingradiology report classification |
| spellingShingle | Alice Schiavone Lea Marie Pehrson Silvia Ingala Rasmus Bonnevie Marco Fraccaro Dana Li Michael Bachmann Nielsen Desmond Elliott Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study AI AI for healthcare natural language processing radiology report classification |
| title | Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study |
| title_full | Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study |
| title_fullStr | Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study |
| title_full_unstemmed | Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study |
| title_short | Effective Machine Learning Techniques for Non-English Radiology Report Classification: A Danish Case Study |
| title_sort | effective machine learning techniques for non english radiology report classification a danish case study |
| topic | AI for healthcare natural language processing radiology report classification |
| url | https://www.mdpi.com/2673-2688/6/2/37 |
| work_keys_str_mv | AT aliceschiavone effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT leamariepehrson effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT silviaingala effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT rasmusbonnevie effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT marcofraccaro effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT danali effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT michaelbachmannnielsen effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy AT desmondelliott effectivemachinelearningtechniquesfornonenglishradiologyreportclassificationadanishcasestudy |