Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, pler...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10833635/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592950337470464 |
---|---|
author | Divya Khanna Arun Kumar Shahid Ahmad Bhat |
author_facet | Divya Khanna Arun Kumar Shahid Ahmad Bhat |
author_sort | Divya Khanna |
collection | DOAJ |
description | The advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset. |
format | Article |
id | doaj-art-b3265cd832f24eacad38e1a0625d736e |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-b3265cd832f24eacad38e1a0625d736e2025-01-21T00:01:35ZengIEEEIEEE Access2169-35362025-01-01139809982010.1109/ACCESS.2025.352702710833635Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature SelectionDivya Khanna0Arun Kumar1Shahid Ahmad Bhat2https://orcid.org/0000-0002-6791-5913Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, IndiaCentre for Artificial Intelligence, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, IndiaLUT Business School, LUT University, Lappeenranta, FinlandThe advancement of biomarkers is critically important at present, as lung cancer is a leading cause of death. In the present study, volatile organic compounds (VOCs) are considered as biomarkers to predict lung cancer. VOCs from seven different sources including breath, blood, urine, cell line, plerual fluid, cancer tissue and lung tissue are targeted to enhance the prediction reliability. Feature selection and models fusion have been focused on during this study. Five in-built and one proposed ensemble machine learning model have been utilised to investigate the different types of VOCs. The idea behind designing one ensemble model is to combine multiple individual models for better performance by using optimal feature sets. This reasoning led to the design of an ensemble model to predict breath VOCs. The AvNNet model has superior performance in predicting blood VOCs, cancer tissue VOCs, cell line VOCs, and urine VOCs compared to four other models, achieving accuracies of 70%, 80%, 70%, and 90% accordingly on the validation dataset. The Blackboost model achieved 90% accuracy on the validation dataset in its prediction of lung tissue VOCs. With 90% accuracy on a validation dataset, the random forest model predicts pleural fluid volatile organic compounds efficiently. When compared to individual models, the proposed ensemble model predicts breath VOCs more effectively and achieves 100% accuracy on the validation dataset.https://ieeexplore.ieee.org/document/10833635/VOCslung cancerbiomarkersmachine learning modelsensemble modelensemble feature selection approach |
spellingShingle | Divya Khanna Arun Kumar Shahid Ahmad Bhat Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection IEEE Access VOCs lung cancer biomarkers machine learning models ensemble model ensemble feature selection approach |
title | Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection |
title_full | Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection |
title_fullStr | Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection |
title_full_unstemmed | Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection |
title_short | Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection |
title_sort | volatile organic compounds for the prediction of lung cancer by using ensembled machine learning model and feature selection |
topic | VOCs lung cancer biomarkers machine learning models ensemble model ensemble feature selection approach |
url | https://ieeexplore.ieee.org/document/10833635/ |
work_keys_str_mv | AT divyakhanna volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection AT arunkumar volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection AT shahidahmadbhat volatileorganiccompoundsforthepredictionoflungcancerbyusingensembledmachinelearningmodelandfeatureselection |