NLP for computational insights into nutritional impacts on colorectal cancer care
Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usef...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | SLAS Technology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2472630325000536 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850178989574848512 |
|---|---|
| author | Shengnan Gong Xiaohong Jin Yujie Guo Jie Yu |
| author_facet | Shengnan Gong Xiaohong Jin Yujie Guo Jie Yu |
| author_sort | Shengnan Gong |
| collection | DOAJ |
| description | Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations. |
| format | Article |
| id | doaj-art-9e8fc7e67544469fb50467d3df86ea85 |
| institution | OA Journals |
| issn | 2472-6303 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | SLAS Technology |
| spelling | doaj-art-9e8fc7e67544469fb50467d3df86ea852025-08-20T02:18:36ZengElsevierSLAS Technology2472-63032025-06-013210029510.1016/j.slast.2025.100295NLP for computational insights into nutritional impacts on colorectal cancer careShengnan Gong0Xiaohong Jin1Yujie Guo2Jie Yu3The Affiliated Hospital of Nantong University, Nantong University, Nantong, Jiangsu 226001, PR ChinaThe Affiliated Hospital of Nantong University, Nantong, Jiangsu 226001, PR ChinaNantong University, Nantong, Jiangsu 226001, PR China; Corresponding author.Nantong University, Nantong, Jiangsu 226001, PR ChinaColorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.http://www.sciencedirect.com/science/article/pii/S2472630325000536Natural language processing (NLP)Colorectal cancer (CRC)Nutritional impact CRC prediction framework (NICRP-framework)Dietary PatternsAdaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM) |
| spellingShingle | Shengnan Gong Xiaohong Jin Yujie Guo Jie Yu NLP for computational insights into nutritional impacts on colorectal cancer care SLAS Technology Natural language processing (NLP) Colorectal cancer (CRC) Nutritional impact CRC prediction framework (NICRP-framework) Dietary Patterns Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM) |
| title | NLP for computational insights into nutritional impacts on colorectal cancer care |
| title_full | NLP for computational insights into nutritional impacts on colorectal cancer care |
| title_fullStr | NLP for computational insights into nutritional impacts on colorectal cancer care |
| title_full_unstemmed | NLP for computational insights into nutritional impacts on colorectal cancer care |
| title_short | NLP for computational insights into nutritional impacts on colorectal cancer care |
| title_sort | nlp for computational insights into nutritional impacts on colorectal cancer care |
| topic | Natural language processing (NLP) Colorectal cancer (CRC) Nutritional impact CRC prediction framework (NICRP-framework) Dietary Patterns Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM) |
| url | http://www.sciencedirect.com/science/article/pii/S2472630325000536 |
| work_keys_str_mv | AT shengnangong nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare AT xiaohongjin nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare AT yujieguo nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare AT jieyu nlpforcomputationalinsightsintonutritionalimpactsoncolorectalcancercare |