Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings

Tuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment c...

Full description

Saved in:
Bibliographic Details
Main Authors: Alvaro D. Orjuela-Cañón, Andrés F. Romero-Gómez, Andres L. Jutinico, Carlos E. Awad, Erika Vergara, Maria A. Palencia
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5423
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849327753833218048
author Alvaro D. Orjuela-Cañón
Andrés F. Romero-Gómez
Andres L. Jutinico
Carlos E. Awad
Erika Vergara
Maria A. Palencia
author_facet Alvaro D. Orjuela-Cañón
Andrés F. Romero-Gómez
Andres L. Jutinico
Carlos E. Awad
Erika Vergara
Maria A. Palencia
author_sort Alvaro D. Orjuela-Cañón
collection DOAJ
description Tuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment complicates efforts to control the disease. These resource constraints are especially critical in remote areas with few mechanisms for timely diagnosis, which is essential for effective patient management. Artificial intelligence (AI) has emerged as a valuable tool in supporting health professionals by enhancing diagnostic processes. This paper explores the use of natural language processing (NLP) techniques and machine learning (ML) models to facilitate TB diagnosis in settings where robust data infrastructure is unavailable. Two distinct data sources were analyzed: text extracted from electronic medical records (EMRs) and patient clinical data (CD). Four different ML-based approaches were implemented: two models using each data source independently and two data fusion models combining both sources. The relevance of these strategies was assessed in collaboration with physicians to ensure their practical applicability in clinical decision-making. The results of the data fusion models were compared to determine which source provided more valuable diagnostic information. The best-performing model, which relied solely on CD, achieved a sensitivity of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73</mn><mo>%</mo></mrow></semantics></math></inline-formula>, outperforming smear microscopy, which typically ranges from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>40</mn><mo>%</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60</mn><mo>%</mo></mrow></semantics></math></inline-formula>. These findings underscore the importance of analyzing physicians’ reports and assessing the availability of such information alongside structured clinical data. This approach is particularly beneficial in resource-limited settings, where access to comprehensive clinical data may be restricted.
format Article
id doaj-art-a11ba723e70c4dbcbebfe2e113512c09
institution Kabale University
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-a11ba723e70c4dbcbebfe2e113512c092025-08-20T03:47:48ZengMDPI AGApplied Sciences2076-34172025-05-011510542310.3390/app15105423Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited SettingsAlvaro D. Orjuela-Cañón0Andrés F. Romero-Gómez1Andres L. Jutinico2Carlos E. Awad3Erika Vergara4Maria A. Palencia5School of Medicine and Health Sciences, Universidad del Rosario, Bogota 111221, ColombiaFundación Santa Fe de Bogotá, Bogota 110111, ColombiaBiomedical Engineering, Universidad Antonio Nariño, Bogota 110311, ColombiaSubred Integrada de Servicios de Salud Centro Oriente, Bogota 111711, ColombiaHospital Universitario Nacional, Bogota 111321, ColombiaSubred Integrada de Servicios de Salud Centro Oriente, Bogota 111711, ColombiaTuberculosis (TB) is an infectious disease that has been declared a global emergency by the World Health Organization and remains one of the top ten causes of death worldwide. TB diagnosis is particularly challenging in developing countries, where limited infrastructure for detection and treatment complicates efforts to control the disease. These resource constraints are especially critical in remote areas with few mechanisms for timely diagnosis, which is essential for effective patient management. Artificial intelligence (AI) has emerged as a valuable tool in supporting health professionals by enhancing diagnostic processes. This paper explores the use of natural language processing (NLP) techniques and machine learning (ML) models to facilitate TB diagnosis in settings where robust data infrastructure is unavailable. Two distinct data sources were analyzed: text extracted from electronic medical records (EMRs) and patient clinical data (CD). Four different ML-based approaches were implemented: two models using each data source independently and two data fusion models combining both sources. The relevance of these strategies was assessed in collaboration with physicians to ensure their practical applicability in clinical decision-making. The results of the data fusion models were compared to determine which source provided more valuable diagnostic information. The best-performing model, which relied solely on CD, achieved a sensitivity of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73</mn><mo>%</mo></mrow></semantics></math></inline-formula>, outperforming smear microscopy, which typically ranges from <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>40</mn><mo>%</mo></mrow></semantics></math></inline-formula> to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>60</mn><mo>%</mo></mrow></semantics></math></inline-formula>. These findings underscore the importance of analyzing physicians’ reports and assessing the availability of such information alongside structured clinical data. This approach is particularly beneficial in resource-limited settings, where access to comprehensive clinical data may be restricted.https://www.mdpi.com/2076-3417/15/10/5423artificial intelligencetuberculosis diagnosisdata fusion
spellingShingle Alvaro D. Orjuela-Cañón
Andrés F. Romero-Gómez
Andres L. Jutinico
Carlos E. Awad
Erika Vergara
Maria A. Palencia
Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
Applied Sciences
artificial intelligence
tuberculosis diagnosis
data fusion
title Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
title_full Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
title_fullStr Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
title_full_unstemmed Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
title_short Data Fusion of Medical Records and Clinical Data to Enhance Tuberculosis Diagnosis in Resource-Limited Settings
title_sort data fusion of medical records and clinical data to enhance tuberculosis diagnosis in resource limited settings
topic artificial intelligence
tuberculosis diagnosis
data fusion
url https://www.mdpi.com/2076-3417/15/10/5423
work_keys_str_mv AT alvarodorjuelacanon datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings
AT andresfromerogomez datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings
AT andresljutinico datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings
AT carloseawad datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings
AT erikavergara datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings
AT mariaapalencia datafusionofmedicalrecordsandclinicaldatatoenhancetuberculosisdiagnosisinresourcelimitedsettings