Towards Early Maternal Morbidity Risk Identification by Concept Extraction from Clinical Notes in Spanish Using Fine-Tuned Transformer-Based Models
Early detection of morbidities that complicate pregnancy improves health outcomes in low- and middle-income countries. Automatic revision of electronic health records (EHRs) can help identify such morbidity risks. There is a lack of corpora to train models in Spanish in specific domains, and there a...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Applied System Innovation |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2571-5577/8/3/78 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Early detection of morbidities that complicate pregnancy improves health outcomes in low- and middle-income countries. Automatic revision of electronic health records (EHRs) can help identify such morbidity risks. There is a lack of corpora to train models in Spanish in specific domains, and there are no models specialized in maternal EHRs. This study aims to develop a fine-tuned model that detects clinical concepts using a built database with text extracted from maternal EHRs in Spanish. We created a corpus with 13.998 annotations from 200 clinical notes in Spanish associated with EHRs obtained from a reference institution of high obstetric risk in Colombia. Using the Beginning–Inside–Outside tagging scheme, we fine-tuned five different transformer-based models to classify between 16 classes associated with eight entities. The best model achieved a macro F1 score of 0.55 ± 0.03. The entities with the best performance were signs, symptoms, and negations, with exact F1 scores of 0.714 and 0.726, respectively. The lower scores were associated with those classes with fewer observations. Even though our dataset is shorter in size and more diverse in entity types than other datasets in Spanish, our results are comparable to other state-of-the-art named entity recognition models fine-tuned in Spanish and the biomedical domain. This work introduces the first fine-tuning of a model for named entity recognition specifically designed for maternal EHRs. Our results can be used as a base to develop new models to extract concepts in the maternal–fetal domains and help healthcare providers detect morbidities that complicate pregnancy early. |
|---|---|
| ISSN: | 2571-5577 |