NLP for computational insights into nutritional impacts on colorectal cancer care

Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usef...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengnan Gong, Xiaohong Jin, Yujie Guo, Jie Yu
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:SLAS Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2472630325000536
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850178989574848512
author Shengnan Gong
Xiaohong Jin
Yujie Guo
Jie Yu
author_facet Shengnan Gong
Xiaohong Jin
Yujie Guo
Jie Yu
author_sort Shengnan Gong
collection DOAJ
description Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.
format Article
id doaj-art-9e8fc7e67544469fb50467d3df86ea85
institution OA Journals
issn 2472-6303
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series SLAS Technology
spelling doaj-art-9e8fc7e67544469fb50467d3df86ea852025-08-20T02:18:36ZengElsevierSLAS Technology2472-63032025-06-013210029510.1016/j.slast.2025.100295NLP for computational insights into nutritional impacts on colorectal cancer careShengnan Gong0Xiaohong Jin1Yujie Guo2Jie Yu3The Affiliated Hospital of Nantong University, Nantong University, Nantong, Jiangsu 226001, PR ChinaThe Affiliated Hospital of Nantong University, Nantong, Jiangsu 226001, PR ChinaNantong University, Nantong, Jiangsu 226001, PR China; Corresponding author.Nantong University, Nantong, Jiangsu 226001, PR ChinaColorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.http://www.sciencedirect.com/science/article/pii/S2472630325000536Natural language processing (NLP)Colorectal cancer (CRC)Nutritional impact CRC prediction framework (NICRP-framework)Dietary PatternsAdaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM)
spellingShingle Shengnan Gong
Xiaohong Jin
Yujie Guo
Jie Yu
NLP for computational insights into nutritional impacts on colorectal cancer care
SLAS Technology
Natural language processing (NLP)
Colorectal cancer (CRC)
Nutritional impact CRC prediction framework (NICRP-framework)
Dietary Patterns
Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM)
title NLP for computational insights into nutritional impacts on colorectal cancer care
title_full NLP for computational insights into nutritional impacts on colorectal cancer care
title_fullStr NLP for computational insights into nutritional impacts on colorectal cancer care
title_full_unstemmed NLP for computational insights into nutritional impacts on colorectal cancer care
title_short NLP for computational insights into nutritional impacts on colorectal cancer care
title_sort nlp for computational insights into nutritional impacts on colorectal cancer care
topic Natural language processing (NLP)
Colorectal cancer (CRC)
Nutritional impact CRC prediction framework (NICRP-framework)
Dietary Patterns
Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM)
url http://www.sciencedirect.com/science/article/pii/S2472630325000536
work_keys_str_mv AT shengnangong nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare
AT xiaohongjin nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare
AT yujieguo nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare
AT jieyu nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare