Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection

The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, pler...

Full description

Saved in:
Bibliographic Details
Main Authors: Divya Khanna, Arun Kumar, Shahid Ahmad Bhat
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10833635/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592950337470464
author Divya Khanna
Arun Kumar
Shahid Ahmad Bhat
author_facet Divya Khanna
Arun Kumar
Shahid Ahmad Bhat
author_sort Divya Khanna
collection DOAJ
description The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset.
format Article
id doaj-art-b3265cd832f24eacad38e1a0625d736e
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-b3265cd832f24eacad38e1a0625d736e2025-01-21T00:01:35ZengIEEEIEEE Access2169-35362025-01-01139809982010.1109/ACCESS.2025.352702710833635Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature SelectionDivya Khanna0Arun Kumar1Shahid Ahmad Bhat2https://orcid.org/0000-0002-6791-5913Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, IndiaCentre for Artificial Intelligence, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, IndiaLUT Business School, LUT University, Lappeenranta, FinlandThe advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset.https://ieeexplore.ieee.org/document/10833635/VOCslung cancerbiomarkersmachine learning modelsensemble modelensemble feature selection approach
spellingShingle Divya Khanna
Arun Kumar
Shahid Ahmad Bhat
Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
IEEE Access
VOCs
lung cancer
biomarkers
machine learning models
ensemble model
ensemble feature selection approach
title Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
title_full Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
title_fullStr Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
title_full_unstemmed Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
title_short Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
title_sort volatile organic compounds for the prediction of lung cancer by using ensembled machine learning model and feature selection
topic VOCs
lung cancer
biomarkers
machine learning models
ensemble model
ensemble feature selection approach
url https://ieeexplore.ieee.org/document/10833635/
work_keys_str_mv AT divyakhanna volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection
AT arunkumar volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection
AT shahidahmadbhat volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection