Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods

Abstract Classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) poses significant challenges for cytopathologists, often necessitating clinical tests and biopsies that delay treatment initiation. To address this, we developed a machine learning-based approach utilizing resected lung...

Full description

Saved in:
Bibliographic Details
Main Authors: Pragya Kashyap, Kalbhavi Vadhi Raj, Jyoti Sharma, Naveen Dutt, Pankaj Yadav
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Systems Biology and Applications
Online Access:https://doi.org/10.1038/s41540-025-00491-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594623040585728
author Pragya Kashyap
Kalbhavi Vadhi Raj
Jyoti Sharma
Naveen Dutt
Pankaj Yadav
author_facet Pragya Kashyap
Kalbhavi Vadhi Raj
Jyoti Sharma
Naveen Dutt
Pankaj Yadav
author_sort Pragya Kashyap
collection DOAJ
description Abstract Classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) poses significant challenges for cytopathologists, often necessitating clinical tests and biopsies that delay treatment initiation. To address this, we developed a machine learning-based approach utilizing resected lung-tissue microbiome of AC and SCC patients for subtype classification. Differentially enriched taxa were identified using LEfSe, revealing ten potential microbial markers. Linear discriminant analysis (LDA) was subsequently applied to enhance inter-class separability. Next, benchmarking was performed across six different supervised-classification algorithms viz. logistic-regression, naïve-bayes, random-forest, extreme-gradient-boost (XGBoost), k-nearest neighbor, and deep neural network. Noteworthy, XGBoost, with an accuracy of 76.25%, and AUROC (area-under-receiver-operating-characteristic) of 0.81 with 69% specificity and 76% sensitivity, outperform the other five classification algorithms using LDA-transformed features. Validation on an independent dataset confirmed its robustness with an AUROC of 0.71, with minimal false positives and negatives. This study is the first to classify AC and SCC subtypes using lung-tissue microbiome.
format Article
id doaj-art-641d748f844d43ecbcb378e0c6adc3c7
institution Kabale University
issn 2056-7189
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Systems Biology and Applications
spelling doaj-art-641d748f844d43ecbcb378e0c6adc3c72025-01-19T12:28:17ZengNature Portfolionpj Systems Biology and Applications2056-71892025-01-0111111310.1038/s41540-025-00491-4Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methodsPragya Kashyap0Kalbhavi Vadhi Raj1Jyoti Sharma2Naveen Dutt3Pankaj Yadav4Department of Bioscience & Bioengineering, Indian Institute of TechnologyDepartment of Electrical Engineering, Indian Institute of TechnologyDepartment of Bioscience & Bioengineering, Indian Institute of TechnologyDepartment of Pulmonary Medicine, All India Institute of Medical SciencesDepartment of Bioscience & Bioengineering, Indian Institute of TechnologyAbstract Classification of adenocarcinoma (AC) and squamous cell carcinoma (SCC) poses significant challenges for cytopathologists, often necessitating clinical tests and biopsies that delay treatment initiation. To address this, we developed a machine learning-based approach utilizing resected lung-tissue microbiome of AC and SCC patients for subtype classification. Differentially enriched taxa were identified using LEfSe, revealing ten potential microbial markers. Linear discriminant analysis (LDA) was subsequently applied to enhance inter-class separability. Next, benchmarking was performed across six different supervised-classification algorithms viz. logistic-regression, naïve-bayes, random-forest, extreme-gradient-boost (XGBoost), k-nearest neighbor, and deep neural network. Noteworthy, XGBoost, with an accuracy of 76.25%, and AUROC (area-under-receiver-operating-characteristic) of 0.81 with 69% specificity and 76% sensitivity, outperform the other five classification algorithms using LDA-transformed features. Validation on an independent dataset confirmed its robustness with an AUROC of 0.71, with minimal false positives and negatives. This study is the first to classify AC and SCC subtypes using lung-tissue microbiome.https://doi.org/10.1038/s41540-025-00491-4
spellingShingle Pragya Kashyap
Kalbhavi Vadhi Raj
Jyoti Sharma
Naveen Dutt
Pankaj Yadav
Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
npj Systems Biology and Applications
title Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
title_full Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
title_fullStr Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
title_full_unstemmed Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
title_short Classification of NSCLC subtypes using lung microbiome from resected tissue based on machine learning methods
title_sort classification of nsclc subtypes using lung microbiome from resected tissue based on machine learning methods
url https://doi.org/10.1038/s41540-025-00491-4
work_keys_str_mv AT pragyakashyap classificationofnsclcsubtypesusinglungmicrobiomefromresectedtissuebasedonmachinelearningmethods
AT kalbhavivadhiraj classificationofnsclcsubtypesusinglungmicrobiomefromresectedtissuebasedonmachinelearningmethods
AT jyotisharma classificationofnsclcsubtypesusinglungmicrobiomefromresectedtissuebasedonmachinelearningmethods
AT naveendutt classificationofnsclcsubtypesusinglungmicrobiomefromresectedtissuebasedonmachinelearningmethods
AT pankajyadav classificationofnsclcsubtypesusinglungmicrobiomefromresectedtissuebasedonmachinelearningmethods