NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images

Abstract Low-dose computed tomography (LDCT) is the most effective tools for early detection of lung cancer. With advancements in artificial intelligence, various Computer-Aided Diagnosis (CAD) systems are now supported in clinical practice. For radiologists dealing with a huge volume of CT scans, C...

Full description

Saved in:
Bibliographic Details
Main Authors: Kun-Hui Chen, Yi-Hui Lin, Shawn Wu, Nai-Wen Shih, Hsing-Chen Meng, Yen-Yu Lin, Chun-Rong Huang, Jing-Wen Huang
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05742-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226631862812672
author Kun-Hui Chen
Yi-Hui Lin
Shawn Wu
Nai-Wen Shih
Hsing-Chen Meng
Yen-Yu Lin
Chun-Rong Huang
Jing-Wen Huang
author_facet Kun-Hui Chen
Yi-Hui Lin
Shawn Wu
Nai-Wen Shih
Hsing-Chen Meng
Yen-Yu Lin
Chun-Rong Huang
Jing-Wen Huang
author_sort Kun-Hui Chen
collection DOAJ
description Abstract Low-dose computed tomography (LDCT) is the most effective tools for early detection of lung cancer. With advancements in artificial intelligence, various Computer-Aided Diagnosis (CAD) systems are now supported in clinical practice. For radiologists dealing with a huge volume of CT scans, CAD systems are helpful. However, the development of these systems depends on precisely annotated datasets, which are currently limited. Although several lung imaging datasets exist, there is only few of publicly available datasets with segmentation annotations on LDCT images. To address this problem, we developed a dataset based on NLST LDCT images with pixel-level annotations of lung lesions. The dataset includes LDCT scans from 605 patients and 715 annotated lesions, including 662 lung tumors and 53 lung nodules. Lesion volumes range from 0.03 cm3 to 372.21 cm3, with 500 lesions smaller than 5 cm3, mostly located in the right upper lung. A 2D U-Net model trained on the dataset achieved a 0.95 IoU on training dataset. This dataset enhances the diversity and usability of lung cancer annotation resources.
format Article
id doaj-art-2e1d2e8800164be79ffcbf3885b83164
institution Kabale University
issn 2052-4463
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-2e1d2e8800164be79ffcbf3885b831642025-08-24T11:07:19ZengNature PortfolioScientific Data2052-44632025-08-0112111210.1038/s41597-025-05742-xNLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT ImagesKun-Hui Chen0Yi-Hui Lin1Shawn Wu2Nai-Wen Shih3Hsing-Chen Meng4Yen-Yu Lin5Chun-Rong Huang6Jing-Wen Huang7Department of Orthopedic Surgery, Taichung Veterans General HospitalDepartment of Radiation Oncology, Pingtung Veterans General HospitalDepartment of Diagnostic Imaging, SY Research InstituteDepartment of Radiation Oncology, Pingtung Veterans General HospitalGraduate Degree Program of AI, National Yang Ming Chiao Tung UniversityDepartment of Computer Science, National Yang Ming Chiao Tung UniversityDepartment of Computer Science, National Yang Ming Chiao Tung UniversityDepartment of Radiation Oncology, Taichung Veterans General HospitalAbstract Low-dose computed tomography (LDCT) is the most effective tools for early detection of lung cancer. With advancements in artificial intelligence, various Computer-Aided Diagnosis (CAD) systems are now supported in clinical practice. For radiologists dealing with a huge volume of CT scans, CAD systems are helpful. However, the development of these systems depends on precisely annotated datasets, which are currently limited. Although several lung imaging datasets exist, there is only few of publicly available datasets with segmentation annotations on LDCT images. To address this problem, we developed a dataset based on NLST LDCT images with pixel-level annotations of lung lesions. The dataset includes LDCT scans from 605 patients and 715 annotated lesions, including 662 lung tumors and 53 lung nodules. Lesion volumes range from 0.03 cm3 to 372.21 cm3, with 500 lesions smaller than 5 cm3, mostly located in the right upper lung. A 2D U-Net model trained on the dataset achieved a 0.95 IoU on training dataset. This dataset enhances the diversity and usability of lung cancer annotation resources.https://doi.org/10.1038/s41597-025-05742-x
spellingShingle Kun-Hui Chen
Yi-Hui Lin
Shawn Wu
Nai-Wen Shih
Hsing-Chen Meng
Yen-Yu Lin
Chun-Rong Huang
Jing-Wen Huang
NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
Scientific Data
title NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
title_full NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
title_fullStr NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
title_full_unstemmed NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
title_short NLSTseg: A Pixel-level Lung Cancer Dataset Based on NLST LDCT Images
title_sort nlstseg a pixel level lung cancer dataset based on nlst ldct images
url https://doi.org/10.1038/s41597-025-05742-x
work_keys_str_mv AT kunhuichen nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT yihuilin nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT shawnwu nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT naiwenshih nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT hsingchenmeng nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT yenyulin nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT chunronghuang nlstsegapixellevellungcancerdatasetbasedonnlstldctimages
AT jingwenhuang nlstsegapixellevellungcancerdatasetbasedonnlstldctimages