Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
Tuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment c...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5423 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849327753833218048 |
|---|---|
| author | Alvaro D. Orjuela-Cañón Andrés F. Romero-Gómez Andres L. Jutinico Carlos E. Awad Erika Vergara Maria A. Palencia |
| author_facet | Alvaro D. Orjuela-Cañón Andrés F. Romero-Gómez Andres L. Jutinico Carlos E. Awad Erika Vergara Maria A. Palencia |
| author_sort | Alvaro D. Orjuela-Cañón |
| collection | DOAJ |
| description | Tuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment complicates efforts to control the disease. These resource constraints are especially critical in remote areas with few mechanisms for timely diagnosis, which is essential for effective patient management. Artificial intelligence (AI) has emerged as a valuable tool in supporting health professionals by enhancing diagnostic processes. This paper explores the use of natural language processing (NLP) techniques and machine learning (ML) models to facilitate TB diagnosis in settings where robust data infrastructure is unavailable. Two distinct data sources were analyzed: text extracted from electronic medical records (EMRs) and patient clinical data (CD). Four different ML-based approaches were implemented: two models using each data source independently and two data fusion models combining both sources. The relevance of these strategies was assessed in collaboration with physicians to ensure their practical applicability in clinical decision-making. The results of the data fusion models were compared to determine which source provided more valuable diagnostic information. The best-performing model, which relied solely on CD, achieved a sensitivity of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73</mn><mo>%</mo></mrow></semantics></math></inline-formula>, outperforming smear microscopy, which typically ranges from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>40</mn><mo>%</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60</mn><mo>%</mo></mrow></semantics></math></inline-formula>. These findings underscore the importance of analyzing physicians’ reports and assessing the availability of such information alongside structured clinical data. This approach is particularly beneficial in resource-limited settings, where access to comprehensive clinical data may be restricted. |
| format | Article |
| id | doaj-art-a11ba723e70c4dbcbebfe2e113512c09 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-a11ba723e70c4dbcbebfe2e113512c092025-08-20T03:47:48ZengMDPI AGApplied Sciences2076-34172025-05-011510542310.3390/app15105423Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited SettingsAlvaro D. Orjuela-Cañón0Andrés F. Romero-Gómez1Andres L. Jutinico2Carlos E. Awad3Erika Vergara4Maria A. Palencia5School of Medicine and Health Sciences, Universidad del Rosario, Bogota 111221, ColombiaFundación Santa Fe de Bogotá, Bogota 110111, ColombiaBiomedical Engineering, Universidad Antonio Nariño, Bogota 110311, ColombiaSubred Integrada de Servicios de Salud Centro Oriente, Bogota 111711, ColombiaHospital Universitario Nacional, Bogota 111321, ColombiaSubred Integrada de Servicios de Salud Centro Oriente, Bogota 111711, ColombiaTuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment complicates efforts to control the disease. These resource constraints are especially critical in remote areas with few mechanisms for timely diagnosis, which is essential for effective patient management. Artificial intelligence (AI) has emerged as a valuable tool in supporting health professionals by enhancing diagnostic processes. This paper explores the use of natural language processing (NLP) techniques and machine learning (ML) models to facilitate TB diagnosis in settings where robust data infrastructure is unavailable. Two distinct data sources were analyzed: text extracted from electronic medical records (EMRs) and patient clinical data (CD). Four different ML-based approaches were implemented: two models using each data source independently and two data fusion models combining both sources. The relevance of these strategies was assessed in collaboration with physicians to ensure their practical applicability in clinical decision-making. The results of the data fusion models were compared to determine which source provided more valuable diagnostic information. The best-performing model, which relied solely on CD, achieved a sensitivity of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73</mn><mo>%</mo></mrow></semantics></math></inline-formula>, outperforming smear microscopy, which typically ranges from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>40</mn><mo>%</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60</mn><mo>%</mo></mrow></semantics></math></inline-formula>. These findings underscore the importance of analyzing physicians’ reports and assessing the availability of such information alongside structured clinical data. This approach is particularly beneficial in resource-limited settings, where access to comprehensive clinical data may be restricted.https://www.mdpi.com/2076-3417/15/10/5423artificial intelligencetuberculosis diagnosisdata fusion |
| spellingShingle | Alvaro D. Orjuela-Cañón Andrés F. Romero-Gómez Andres L. Jutinico Carlos E. Awad Erika Vergara Maria A. Palencia Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings Applied Sciences artificial intelligence tuberculosis diagnosis data fusion |
| title | Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings |
| title_full | Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings |
| title_fullStr | Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings |
| title_full_unstemmed | Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings |
| title_short | Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings |
| title_sort | data fusion of medical records and clinical data to enhance tuberculosis diagnosis in resource limited settings |
| topic | artificial intelligence tuberculosis diagnosis data fusion |
| url | https://www.mdpi.com/2076-3417/15/10/5423 |
| work_keys_str_mv | AT alvarodorjuelacanon datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings AT andresfromerogomez datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings AT andresljutinico datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings AT carloseawad datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings AT erikavergara datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings AT mariaapalencia datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings |