Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology

Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Khanisyah Erza Gumilar, Birama R. Indraprasta, Ach Salman Faridzi, Bagus M. Wibowo, Aditya Herlambang, Eccita Rahestyningtyas, Budi Irawan, Zulkarnain Tambunan, Ahmad Fadhli Bustomi, Bagus Ngurah Brahmantara, Zih-Ying Yu, Yu-Cheng Hsu, Herlangga Pramuditya, Very Great E. Putra, Hari Nugroho, Pungky Mulawardhana, Brahmana A. Tjokroprawiro, Tri Hedianto, Ibrahim H. Ibrahim, Jingshan Huang, Dongqi Li, Chien-Hsing Lu, Jer-Yen Yang, Li-Na Liao, Ming Tan
Format:	Article
Language:	English
Published:	Elsevier 2024-12-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	Gynecologic cancer Large Language Models Accuracy Consistency Artificial intelligence
Online Access:	http://www.sciencedirect.com/science/article/pii/S2001037024003702
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850191295000084480
author	Khanisyah Erza Gumilar Birama R. Indraprasta Ach Salman Faridzi Bagus M. Wibowo Aditya Herlambang Eccita Rahestyningtyas Budi Irawan Zulkarnain Tambunan Ahmad Fadhli Bustomi Bagus Ngurah Brahmantara Zih-Ying Yu Yu-Cheng Hsu Herlangga Pramuditya Very Great E. Putra Hari Nugroho Pungky Mulawardhana Brahmana A. Tjokroprawiro Tri Hedianto Ibrahim H. Ibrahim Jingshan Huang Dongqi Li Chien-Hsing Lu Jer-Yen Yang Li-Na Liao Ming Tan
author_facet	Khanisyah Erza Gumilar Birama R. Indraprasta Ach Salman Faridzi Bagus M. Wibowo Aditya Herlambang Eccita Rahestyningtyas Budi Irawan Zulkarnain Tambunan Ahmad Fadhli Bustomi Bagus Ngurah Brahmantara Zih-Ying Yu Yu-Cheng Hsu Herlangga Pramuditya Very Great E. Putra Hari Nugroho Pungky Mulawardhana Brahmana A. Tjokroprawiro Tri Hedianto Ibrahim H. Ibrahim Jingshan Huang Dongqi Li Chien-Hsing Lu Jer-Yen Yang Li-Na Liao Ming Tan
author_sort	Khanisyah Erza Gumilar
collection	DOAJ
description	Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework. Study design: We assessed three prominent LLMs—ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot—evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence. Results: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making. Conclusion: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.
format	Article
id	doaj-art-e70d57f57c814739bf47e3e669ca2d34
institution	OA Journals
issn	2001-0370
language	English
publishDate	2024-12-01
publisher	Elsevier
record_format	Article
series	Computational and Structural Biotechnology Journal
spelling	doaj-art-e70d57f57c814739bf47e3e669ca2d342025-08-20T02:14:57ZengElsevierComputational and Structural Biotechnology Journal2001-03702024-12-01234019402610.1016/j.csbj.2024.10.050Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncologyKhanisyah Erza Gumilar0Birama R. Indraprasta1Ach Salman Faridzi2Bagus M. Wibowo3Aditya Herlambang4Eccita Rahestyningtyas5Budi Irawan6Zulkarnain Tambunan7Ahmad Fadhli Bustomi8Bagus Ngurah Brahmantara9Zih-Ying Yu10Yu-Cheng Hsu11Herlangga Pramuditya12Very Great E. Putra13Hari Nugroho14Pungky Mulawardhana15Brahmana A. Tjokroprawiro16Tri Hedianto17Ibrahim H. Ibrahim18Jingshan Huang19Dongqi Li20Chien-Hsing Lu21Jer-Yen Yang22Li-Na Liao23Ming Tan24Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia; Correspondence to: Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga, Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Surabaya, Jawa Timur 60115, Indonesia.Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Public Health, China Medical University, Taichung, TaiwanDepartment of Public Health, China Medical University, Taichung, Taiwan; School of Chinese Medicine, China Medical University, Taichung, TaiwanDepartment of Obstetrics and Gynecology, Dr. Ramelan Naval Hospital, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Kariadi Central General Hospital, Semarang, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaFaculty of Medicine and Health, Institut Teknologi Sepuluh Nopember, Surabaya, IndonesiaGraduate Institute of Biomedical Science, China Medical University, Taichung, TaiwanSchool of Computing, College of Medicine, University of South Alabama, Mobile, AL, USASchool of Information and Computer Sciences, School of Social and Behavioral Sciences, University of California, Irvine, CA, USADepartment of Obstetrics and Gynecology, Taichung Veteran General Hospital, Taichung, TaiwanGraduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Correspondence to: Graduate Institute of Biomedical Science, China Medical University, No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Department of Public Health, China Medical University, Taichung, Taiwan; Correspondence to: Department of Public Health, China Medical University, No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Institute of Biochemistry and Molecular Biology and Research Center for Cancer Biology, China Medical University, Taichung, Taiwan; Correspondence to: Institute of Biochemistry and Molecular Biology, Graduate Institute of Biomedical Sciences, China Medical University (Taiwan), No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework. Study design: We assessed three prominent LLMs—ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot—evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence. Results: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making. Conclusion: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.http://www.sciencedirect.com/science/article/pii/S2001037024003702Gynecologic cancerLarge Language ModelsAccuracyConsistencyArtificial intelligence
spellingShingle	Khanisyah Erza Gumilar Birama R. Indraprasta Ach Salman Faridzi Bagus M. Wibowo Aditya Herlambang Eccita Rahestyningtyas Budi Irawan Zulkarnain Tambunan Ahmad Fadhli Bustomi Bagus Ngurah Brahmantara Zih-Ying Yu Yu-Cheng Hsu Herlangga Pramuditya Very Great E. Putra Hari Nugroho Pungky Mulawardhana Brahmana A. Tjokroprawiro Tri Hedianto Ibrahim H. Ibrahim Jingshan Huang Dongqi Li Chien-Hsing Lu Jer-Yen Yang Li-Na Liao Ming Tan Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology Computational and Structural Biotechnology Journal Gynecologic cancer Large Language Models Accuracy Consistency Artificial intelligence
title	Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_full	Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_fullStr	Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_full_unstemmed	Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_short	Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_sort	assessment of large language models llms in decision making support for gynecologic oncology
topic	Gynecologic cancer Large Language Models Accuracy Consistency Artificial intelligence
url	http://www.sciencedirect.com/science/article/pii/S2001037024003702
work_keys_str_mv	AT khanisyaherzagumilar assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT biramarindraprasta assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT achsalmanfaridzi assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT bagusmwibowo assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT adityaherlambang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT eccitarahestyningtyas assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT budiirawan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT zulkarnaintambunan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT ahmadfadhlibustomi assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT bagusngurahbrahmantara assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT zihyingyu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT yuchenghsu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT herlanggapramuditya assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT verygreateputra assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT harinugroho assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT pungkymulawardhana assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT brahmanaatjokroprawiro assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT trihedianto assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT ibrahimhibrahim assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT jingshanhuang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT dongqili assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT chienhsinglu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT jeryenyang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT linaliao assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology AT mingtan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology

Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology

Similar Items