A blood test-based machine learning model for predicting lung cancer risk

BackgroundThe goal of early detection is individual cancer prediction. For lung cancer (LC), age and smoking history are the primary criteria for annual low-dose CT screening, leaving other populations at risk of being overlooked. Machine learning (ML) is a promising method to identify complex patte...

Full description

Saved in:
Bibliographic Details
Main Authors: Lihi Schwartz, Naor Matania, Matanel Levi, Teddy Lazebnik, Shiri Kushnir, Noga Yosef, Assaf Hoogi, Dekel Shlomi
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2025.1577451/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850118674954846208
author Lihi Schwartz
Naor Matania
Matanel Levi
Teddy Lazebnik
Teddy Lazebnik
Shiri Kushnir
Noga Yosef
Assaf Hoogi
Dekel Shlomi
Dekel Shlomi
author_facet Lihi Schwartz
Naor Matania
Matanel Levi
Teddy Lazebnik
Teddy Lazebnik
Shiri Kushnir
Noga Yosef
Assaf Hoogi
Dekel Shlomi
Dekel Shlomi
author_sort Lihi Schwartz
collection DOAJ
description BackgroundThe goal of early detection is individual cancer prediction. For lung cancer (LC), age and smoking history are the primary criteria for annual low-dose CT screening, leaving other populations at risk of being overlooked. Machine learning (ML) is a promising method to identify complex patterns in the data that can reveal personalized disease predictors.MethodsAn ML-based model was used on blood test data collected before the diagnosis of LC, and sociodemographic factors such as age and gender among LC patients and controls were incorporated to predict the risk for future LC diagnosis.ResultsIn addition to age and gender, we identified 22 blood tests that contributed to the model. For the entire study population, the ML model predicted LC with an accuracy of 71.2%, a sensitivity of 63%, and a positive predictive value of 67.2%. Higher accuracy was found among women than men (71.8 vs. 70.8) and among never smokers than smokers (73.6 vs. 70.1%). Age was the most significant contributor (13.6%), followed by red blood cell distribution (5.1%), creatinine (5%), gender (3.6%), and mean corpuscular hemoglobin (3.3%). A majority of the blood tests made a highly variable contribution to the complex ML model; however, some tests, such as red cell distribution width, mean corpuscular hemoglobin, prothrombin time, hematocrit, urea, and calcium, contributed slightly more to a dichotomous prediction.ConclusionBlood tests can be used in the proposed ML model to predict LC. More studies are needed in basic science fields to identify possible explanations between specific blood results and LC prediction.
format Article
id doaj-art-7b12e467644a445aad1bf92464c997a9
institution OA Journals
issn 2296-858X
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-7b12e467644a445aad1bf92464c997a92025-08-20T02:35:49ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-06-011210.3389/fmed.2025.15774511577451A blood test-based machine learning model for predicting lung cancer riskLihi Schwartz0Naor Matania1Matanel Levi2Teddy Lazebnik3Teddy Lazebnik4Shiri Kushnir5Noga Yosef6Assaf Hoogi7Dekel Shlomi8Dekel Shlomi9Fliner Clinic, Department of Family Medicine, Dan-Petah-Tiqwa District, Clalit Health Services Community Division, Petah Tiqwa, IsraelDepartment of Computer Science, Bar Ilan University, Ramat Gan, IsraelAdelson School of Medicine, Ariel University, Ariel, IsraelDepartment of Mathematics, Ariel University, Ariel, IsraelDepartment of Cancer Biology, Cancer Institute, University College London, London, United KingdomResearch Authority, Rabin Medical Center, Beilinson Campus, Petah Tiqwa, IsraelResearch Unit, Dan-Petah-Tiqwa District, Clalit Health Services Community Division, Ramat Gan, IsraelThe School of Computer Science and The Data Science and Artificial Intelligence Research Center, Ariel University, Ariel, IsraelAdelson School of Medicine, Ariel University, Ariel, IsraelPulmonary Clinic, Dan-Petah-Tiqwa District, Clalit Health Services Community Division, Ramat Gan, IsraelBackgroundThe goal of early detection is individual cancer prediction. For lung cancer (LC), age and smoking history are the primary criteria for annual low-dose CT screening, leaving other populations at risk of being overlooked. Machine learning (ML) is a promising method to identify complex patterns in the data that can reveal personalized disease predictors.MethodsAn ML-based model was used on blood test data collected before the diagnosis of LC, and sociodemographic factors such as age and gender among LC patients and controls were incorporated to predict the risk for future LC diagnosis.ResultsIn addition to age and gender, we identified 22 blood tests that contributed to the model. For the entire study population, the ML model predicted LC with an accuracy of 71.2%, a sensitivity of 63%, and a positive predictive value of 67.2%. Higher accuracy was found among women than men (71.8 vs. 70.8) and among never smokers than smokers (73.6 vs. 70.1%). Age was the most significant contributor (13.6%), followed by red blood cell distribution (5.1%), creatinine (5%), gender (3.6%), and mean corpuscular hemoglobin (3.3%). A majority of the blood tests made a highly variable contribution to the complex ML model; however, some tests, such as red cell distribution width, mean corpuscular hemoglobin, prothrombin time, hematocrit, urea, and calcium, contributed slightly more to a dichotomous prediction.ConclusionBlood tests can be used in the proposed ML model to predict LC. More studies are needed in basic science fields to identify possible explanations between specific blood results and LC prediction.https://www.frontiersin.org/articles/10.3389/fmed.2025.1577451/fulllung cancerartificial intelligencemachine learningblood testprediction model
spellingShingle Lihi Schwartz
Naor Matania
Matanel Levi
Teddy Lazebnik
Teddy Lazebnik
Shiri Kushnir
Noga Yosef
Assaf Hoogi
Dekel Shlomi
Dekel Shlomi
A blood test-based machine learning model for predicting lung cancer risk
Frontiers in Medicine
lung cancer
artificial intelligence
machine learning
blood test
prediction model
title A blood test-based machine learning model for predicting lung cancer risk
title_full A blood test-based machine learning model for predicting lung cancer risk
title_fullStr A blood test-based machine learning model for predicting lung cancer risk
title_full_unstemmed A blood test-based machine learning model for predicting lung cancer risk
title_short A blood test-based machine learning model for predicting lung cancer risk
title_sort blood test based machine learning model for predicting lung cancer risk
topic lung cancer
artificial intelligence
machine learning
blood test
prediction model
url https://www.frontiersin.org/articles/10.3389/fmed.2025.1577451/full
work_keys_str_mv AT lihischwartz abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT naormatania abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT matanellevi abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT teddylazebnik abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT teddylazebnik abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT shirikushnir abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT nogayosef abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT assafhoogi abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT dekelshlomi abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT dekelshlomi abloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT lihischwartz bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT naormatania bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT matanellevi bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT teddylazebnik bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT teddylazebnik bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT shirikushnir bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT nogayosef bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT assafhoogi bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT dekelshlomi bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk
AT dekelshlomi bloodtestbasedmachinelearningmodelforpredictinglungcancerrisk