Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework

Abstract Exhaled breath samples of lung cancer patients (LC), tuberculosis (TB) patients and asymptomatic controls (C) were analyzed using gas chromatography-mass spectrometry (GC-MS). Ten volatile organic compounds (VOCs) were identified as possible biomarkers after confounders were statistically e...

Full description

Saved in:
Bibliographic Details
Main Authors: Tlotlo Cassandra Setlhare, Atlang Gild Mpolokang, Emmanuel Flahaut, George Chimowa
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-11365-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849235407751872512
author Tlotlo Cassandra Setlhare
Atlang Gild Mpolokang
Emmanuel Flahaut
George Chimowa
author_facet Tlotlo Cassandra Setlhare
Atlang Gild Mpolokang
Emmanuel Flahaut
George Chimowa
author_sort Tlotlo Cassandra Setlhare
collection DOAJ
description Abstract Exhaled breath samples of lung cancer patients (LC), tuberculosis (TB) patients and asymptomatic controls (C) were analyzed using gas chromatography-mass spectrometry (GC-MS). Ten volatile organic compounds (VOCs) were identified as possible biomarkers after confounders were statistically eliminated to enhance biomarker specificity. The diagnostic potential of these possible biomarkers was evaluated using multiple machine learning models and their performance for classifying patients and controls was compared. Partial least squares-discriminant analysis (PLS-DA) emerged as the best-performing model for separating lung cancer from controls, with a recall (sensitivity) of 82%, precision of 90%, accuracy of 80% and F1-score of 86%. To further validate this model, TB data was introduced as a confounding disease, and the model achieved precision, recall, accuracy and F1-score of 88% each, in distinguishing lung cancer from TB. These findings address the inter-disease variability and underscores the reliability of the reported VOCs as potential biomarkers of lung cancer. This study establishes a new framework integrating machine learning and confounder elimination for biomarker confirmation.
format Article
id doaj-art-d19bd997fede435fbe0591f1a49bf8d3
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-d19bd997fede435fbe0591f1a49bf8d32025-08-20T04:02:46ZengNature PortfolioScientific Reports2045-23222025-07-0115111210.1038/s41598-025-11365-4Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis frameworkTlotlo Cassandra Setlhare0Atlang Gild Mpolokang1Emmanuel Flahaut2George Chimowa3Department of Physics and Astronomy, Botswana International University of Science and TechnologyDepartment of Physics and Astronomy, Botswana International University of Science and TechnologyCIRIMAT, Toulouse INP, CNR, Université de ToulouseDepartment of Physics and Astronomy, Botswana International University of Science and TechnologyAbstract Exhaled breath samples of lung cancer patients (LC), tuberculosis (TB) patients and asymptomatic controls (C) were analyzed using gas chromatography-mass spectrometry (GC-MS). Ten volatile organic compounds (VOCs) were identified as possible biomarkers after confounders were statistically eliminated to enhance biomarker specificity. The diagnostic potential of these possible biomarkers was evaluated using multiple machine learning models and their performance for classifying patients and controls was compared. Partial least squares-discriminant analysis (PLS-DA) emerged as the best-performing model for separating lung cancer from controls, with a recall (sensitivity) of 82%, precision of 90%, accuracy of 80% and F1-score of 86%. To further validate this model, TB data was introduced as a confounding disease, and the model achieved precision, recall, accuracy and F1-score of 88% each, in distinguishing lung cancer from TB. These findings address the inter-disease variability and underscores the reliability of the reported VOCs as potential biomarkers of lung cancer. This study establishes a new framework integrating machine learning and confounder elimination for biomarker confirmation.https://doi.org/10.1038/s41598-025-11365-4
spellingShingle Tlotlo Cassandra Setlhare
Atlang Gild Mpolokang
Emmanuel Flahaut
George Chimowa
Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
Scientific Reports
title Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
title_full Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
title_fullStr Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
title_full_unstemmed Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
title_short Determination of lung cancer exhaled breath biomarkers using machine learning-a new analysis framework
title_sort determination of lung cancer exhaled breath biomarkers using machine learning a new analysis framework
url https://doi.org/10.1038/s41598-025-11365-4
work_keys_str_mv AT tlotlocassandrasetlhare determinationoflungcancerexhaledbreathbiomarkersusingmachinelearninganewanalysisframework
AT atlanggildmpolokang determinationoflungcancerexhaledbreathbiomarkersusingmachinelearninganewanalysisframework
AT emmanuelflahaut determinationoflungcancerexhaledbreathbiomarkersusingmachinelearninganewanalysisframework
AT georgechimowa determinationoflungcancerexhaledbreathbiomarkersusingmachinelearninganewanalysisframework