Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study

BackgroundTraditional rule-based natural language processing approaches in electronic health record systems are effective but are often time-consuming and prone to errors when handling unstructured data. This is primarily due to the substantial manual effort required to parse...

Full description

Saved in:
Bibliographic Details
Main Authors: Sanghwan Kim, Sowon Jang, Borham Kim, Leonard Sunwoo, Seok Kim, Jin-Haeng Chung, Sejin Nam, Hyeongmin Cho, Donghyoung Lee, Keehyuck Lee, Sooyoung Yoo
Format: Article
Language:English
Published: JMIR Publications 2024-12-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2024/1/e67056
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850254661113610240
author Sanghwan Kim
Sowon Jang
Borham Kim
Leonard Sunwoo
Seok Kim
Jin-Haeng Chung
Sejin Nam
Hyeongmin Cho
Donghyoung Lee
Keehyuck Lee
Sooyoung Yoo
author_facet Sanghwan Kim
Sowon Jang
Borham Kim
Leonard Sunwoo
Seok Kim
Jin-Haeng Chung
Sejin Nam
Hyeongmin Cho
Donghyoung Lee
Keehyuck Lee
Sooyoung Yoo
author_sort Sanghwan Kim
collection DOAJ
description BackgroundTraditional rule-based natural language processing approaches in electronic health record systems are effective but are often time-consuming and prone to errors when handling unstructured data. This is primarily due to the substantial manual effort required to parse and extract information from diverse types of documentation. Recent advancements in large language model (LLM) technology have made it possible to automatically interpret medical context and support pathologic staging. However, existing LLMs encounter challenges in rapidly adapting to specialized guideline updates. In this study, we fine-tuned an LLM specifically for lung cancer pathologic staging, enabling it to incorporate the latest guidelines for pathologic TN classification. ObjectiveThis study aims to evaluate the performance of fine-tuned generative language models in automatically inferring pathologic TN classifications and extracting their rationale from lung cancer surgical pathology reports. By addressing the inefficiencies and extensive parsing efforts associated with rule-based methods, this approach seeks to enable rapid and accurate reclassification aligned with the latest cancer staging guidelines. MethodsWe conducted a comparative performance evaluation of 6 open-source LLMs for automated TN classification and rationale generation, using 3216 deidentified lung cancer surgical pathology reports based on the American Joint Committee on Cancer (AJCC) Cancer Staging Manual8th edition, collected from a tertiary hospital. The dataset was preprocessed by segmenting each report according to lesion location and morphological diagnosis. Performance was assessed using exact match ratio (EMR) and semantic match ratio (SMR) as evaluation metrics, which measure classification accuracy and the contextual alignment of the generated rationales, respectively. ResultsAmong the 6 models, the Orca2_13b model achieved the highest performance with an EMR of 0.934 and an SMR of 0.864. The Orca2_7b model also demonstrated strong performance, recording an EMR of 0.914 and an SMR of 0.854. In contrast, the Llama2_7b model achieved an EMR of 0.864 and an SMR of 0.771, while the Llama2_13b model showed an EMR of 0.762 and an SMR of 0.690. The Mistral_7b and Llama3_8b models, on the other hand, showed lower performance, with EMRs of 0.572 and 0.489, and SMRs of 0.377 and 0.456, respectively. Overall, the Orca2 models consistently outperformed the others in both TN stage classification and rationale generation. ConclusionsThe generative language model approach presented in this study has the potential to enhance and automate TN classification in complex cancer staging, supporting both clinical practice and oncology data curation. With additional fine-tuning based on cancer-specific guidelines, this approach can be effectively adapted to other cancer types.
format Article
id doaj-art-c95511eb0e4147e48b60df2af80d5f0c
institution OA Journals
issn 2291-9694
language English
publishDate 2024-12-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj-art-c95511eb0e4147e48b60df2af80d5f0c2025-08-20T01:57:04ZengJMIR PublicationsJMIR Medical Informatics2291-96942024-12-0112e6705610.2196/67056Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation StudySanghwan Kimhttps://orcid.org/0009-0007-6500-2225Sowon Janghttps://orcid.org/0000-0003-3320-1557Borham Kimhttps://orcid.org/0000-0002-6914-5721Leonard Sunwoohttps://orcid.org/0000-0003-0374-8658Seok Kimhttps://orcid.org/0000-0003-4996-8613Jin-Haeng Chunghttps://orcid.org/0000-0002-6527-3814Sejin Namhttps://orcid.org/0000-0003-4786-2241Hyeongmin Chohttps://orcid.org/0000-0001-5430-8295Donghyoung Leehttps://orcid.org/0009-0002-2932-794XKeehyuck Leehttps://orcid.org/0000-0001-6906-4887Sooyoung Yoohttps://orcid.org/0000-0001-8620-4925 BackgroundTraditional rule-based natural language processing approaches in electronic health record systems are effective but are often time-consuming and prone to errors when handling unstructured data. This is primarily due to the substantial manual effort required to parse and extract information from diverse types of documentation. Recent advancements in large language model (LLM) technology have made it possible to automatically interpret medical context and support pathologic staging. However, existing LLMs encounter challenges in rapidly adapting to specialized guideline updates. In this study, we fine-tuned an LLM specifically for lung cancer pathologic staging, enabling it to incorporate the latest guidelines for pathologic TN classification. ObjectiveThis study aims to evaluate the performance of fine-tuned generative language models in automatically inferring pathologic TN classifications and extracting their rationale from lung cancer surgical pathology reports. By addressing the inefficiencies and extensive parsing efforts associated with rule-based methods, this approach seeks to enable rapid and accurate reclassification aligned with the latest cancer staging guidelines. MethodsWe conducted a comparative performance evaluation of 6 open-source LLMs for automated TN classification and rationale generation, using 3216 deidentified lung cancer surgical pathology reports based on the American Joint Committee on Cancer (AJCC) Cancer Staging Manual8th edition, collected from a tertiary hospital. The dataset was preprocessed by segmenting each report according to lesion location and morphological diagnosis. Performance was assessed using exact match ratio (EMR) and semantic match ratio (SMR) as evaluation metrics, which measure classification accuracy and the contextual alignment of the generated rationales, respectively. ResultsAmong the 6 models, the Orca2_13b model achieved the highest performance with an EMR of 0.934 and an SMR of 0.864. The Orca2_7b model also demonstrated strong performance, recording an EMR of 0.914 and an SMR of 0.854. In contrast, the Llama2_7b model achieved an EMR of 0.864 and an SMR of 0.771, while the Llama2_13b model showed an EMR of 0.762 and an SMR of 0.690. The Mistral_7b and Llama3_8b models, on the other hand, showed lower performance, with EMRs of 0.572 and 0.489, and SMRs of 0.377 and 0.456, respectively. Overall, the Orca2 models consistently outperformed the others in both TN stage classification and rationale generation. ConclusionsThe generative language model approach presented in this study has the potential to enhance and automate TN classification in complex cancer staging, supporting both clinical practice and oncology data curation. With additional fine-tuning based on cancer-specific guidelines, this approach can be effectively adapted to other cancer types.https://medinform.jmir.org/2024/1/e67056
spellingShingle Sanghwan Kim
Sowon Jang
Borham Kim
Leonard Sunwoo
Seok Kim
Jin-Haeng Chung
Sejin Nam
Hyeongmin Cho
Donghyoung Lee
Keehyuck Lee
Sooyoung Yoo
Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
JMIR Medical Informatics
title Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
title_full Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
title_fullStr Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
title_full_unstemmed Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
title_short Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study
title_sort automated pathologic tn classification prediction and rationale generation from lung cancer surgical pathology reports using a large language model fine tuned with chain of thought algorithm development and validation study
url https://medinform.jmir.org/2024/1/e67056
work_keys_str_mv AT sanghwankim automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT sowonjang automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT borhamkim automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT leonardsunwoo automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT seokkim automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT jinhaengchung automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT sejinnam automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT hyeongmincho automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT donghyounglee automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT keehyucklee automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy
AT sooyoungyoo automatedpathologictnclassificationpredictionandrationalegenerationfromlungcancersurgicalpathologyreportsusingalargelanguagemodelfinetunedwithchainofthoughtalgorithmdevelopmentandvalidationstudy