Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology

Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely a...

Full description

Saved in:
Bibliographic Details
Main Authors: Khanisyah Erza Gumilar, Birama R. Indraprasta, Ach Salman Faridzi, Bagus M. Wibowo, Aditya Herlambang, Eccita Rahestyningtyas, Budi Irawan, Zulkarnain Tambunan, Ahmad Fadhli Bustomi, Bagus Ngurah Brahmantara, Zih-Ying Yu, Yu-Cheng Hsu, Herlangga Pramuditya, Very Great E. Putra, Hari Nugroho, Pungky Mulawardhana, Brahmana A. Tjokroprawiro, Tri Hedianto, Ibrahim H. Ibrahim, Jingshan Huang, Dongqi Li, Chien-Hsing Lu, Jer-Yen Yang, Li-Na Liao, Ming Tan
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037024003702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850191295000084480
author Khanisyah Erza Gumilar
Birama R. Indraprasta
Ach Salman Faridzi
Bagus M. Wibowo
Aditya Herlambang
Eccita Rahestyningtyas
Budi Irawan
Zulkarnain Tambunan
Ahmad Fadhli Bustomi
Bagus Ngurah Brahmantara
Zih-Ying Yu
Yu-Cheng Hsu
Herlangga Pramuditya
Very Great E. Putra
Hari Nugroho
Pungky Mulawardhana
Brahmana A. Tjokroprawiro
Tri Hedianto
Ibrahim H. Ibrahim
Jingshan Huang
Dongqi Li
Chien-Hsing Lu
Jer-Yen Yang
Li-Na Liao
Ming Tan
author_facet Khanisyah Erza Gumilar
Birama R. Indraprasta
Ach Salman Faridzi
Bagus M. Wibowo
Aditya Herlambang
Eccita Rahestyningtyas
Budi Irawan
Zulkarnain Tambunan
Ahmad Fadhli Bustomi
Bagus Ngurah Brahmantara
Zih-Ying Yu
Yu-Cheng Hsu
Herlangga Pramuditya
Very Great E. Putra
Hari Nugroho
Pungky Mulawardhana
Brahmana A. Tjokroprawiro
Tri Hedianto
Ibrahim H. Ibrahim
Jingshan Huang
Dongqi Li
Chien-Hsing Lu
Jer-Yen Yang
Li-Na Liao
Ming Tan
author_sort Khanisyah Erza Gumilar
collection DOAJ
description Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework. Study design: We assessed three prominent LLMs—ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot—evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence. Results: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making. Conclusion: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.
format Article
id doaj-art-e70d57f57c814739bf47e3e669ca2d34
institution OA Journals
issn 2001-0370
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj-art-e70d57f57c814739bf47e3e669ca2d342025-08-20T02:14:57ZengElsevierComputational and Structural Biotechnology Journal2001-03702024-12-01234019402610.1016/j.csbj.2024.10.050Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncologyKhanisyah Erza Gumilar0Birama R. Indraprasta1Ach Salman Faridzi2Bagus M. Wibowo3Aditya Herlambang4Eccita Rahestyningtyas5Budi Irawan6Zulkarnain Tambunan7Ahmad Fadhli Bustomi8Bagus Ngurah Brahmantara9Zih-Ying Yu10Yu-Cheng Hsu11Herlangga Pramuditya12Very Great E. Putra13Hari Nugroho14Pungky Mulawardhana15Brahmana A. Tjokroprawiro16Tri Hedianto17Ibrahim H. Ibrahim18Jingshan Huang19Dongqi Li20Chien-Hsing Lu21Jer-Yen Yang22Li-Na Liao23Ming Tan24Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia; Correspondence to: Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga, Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Surabaya, Jawa Timur 60115, Indonesia.Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Public Health, China Medical University, Taichung, TaiwanDepartment of Public Health, China Medical University, Taichung, Taiwan; School of Chinese Medicine, China Medical University, Taichung, TaiwanDepartment of Obstetrics and Gynecology, Dr. Ramelan Naval Hospital, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Kariadi Central General Hospital, Semarang, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Hospital of Universitas Airlangga - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaDepartment of Obstetrics and Gynecology, Dr. Soetomo General Hospital - Faculty of Medicine, Universitas Airlangga, Surabaya, IndonesiaFaculty of Medicine and Health, Institut Teknologi Sepuluh Nopember, Surabaya, IndonesiaGraduate Institute of Biomedical Science, China Medical University, Taichung, TaiwanSchool of Computing, College of Medicine, University of South Alabama, Mobile, AL, USASchool of Information and Computer Sciences, School of Social and Behavioral Sciences, University of California, Irvine, CA, USADepartment of Obstetrics and Gynecology, Taichung Veteran General Hospital, Taichung, TaiwanGraduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Correspondence to: Graduate Institute of Biomedical Science, China Medical University, No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Department of Public Health, China Medical University, Taichung, Taiwan; Correspondence to: Department of Public Health, China Medical University, No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan; Institute of Biochemistry and Molecular Biology and Research Center for Cancer Biology, China Medical University, Taichung, Taiwan; Correspondence to: Institute of Biochemistry and Molecular Biology, Graduate Institute of Biomedical Sciences, China Medical University (Taiwan), No. 100, Section 1, Jingmao Road, Beitun District, Taichung City 406040, Taiwan.Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework. Study design: We assessed three prominent LLMs—ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot—evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence. Results: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making. Conclusion: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.http://www.sciencedirect.com/science/article/pii/S2001037024003702Gynecologic cancerLarge Language ModelsAccuracyConsistencyArtificial intelligence
spellingShingle Khanisyah Erza Gumilar
Birama R. Indraprasta
Ach Salman Faridzi
Bagus M. Wibowo
Aditya Herlambang
Eccita Rahestyningtyas
Budi Irawan
Zulkarnain Tambunan
Ahmad Fadhli Bustomi
Bagus Ngurah Brahmantara
Zih-Ying Yu
Yu-Cheng Hsu
Herlangga Pramuditya
Very Great E. Putra
Hari Nugroho
Pungky Mulawardhana
Brahmana A. Tjokroprawiro
Tri Hedianto
Ibrahim H. Ibrahim
Jingshan Huang
Dongqi Li
Chien-Hsing Lu
Jer-Yen Yang
Li-Na Liao
Ming Tan
Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
Computational and Structural Biotechnology Journal
Gynecologic cancer
Large Language Models
Accuracy
Consistency
Artificial intelligence
title Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_full Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_fullStr Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_full_unstemmed Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_short Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
title_sort assessment of large language models llms in decision making support for gynecologic oncology
topic Gynecologic cancer
Large Language Models
Accuracy
Consistency
Artificial intelligence
url http://www.sciencedirect.com/science/article/pii/S2001037024003702
work_keys_str_mv AT khanisyaherzagumilar assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT biramarindraprasta assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT achsalmanfaridzi assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT bagusmwibowo assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT adityaherlambang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT eccitarahestyningtyas assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT budiirawan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT zulkarnaintambunan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT ahmadfadhlibustomi assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT bagusngurahbrahmantara assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT zihyingyu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT yuchenghsu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT herlanggapramuditya assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT verygreateputra assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT harinugroho assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT pungkymulawardhana assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT brahmanaatjokroprawiro assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT trihedianto assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT ibrahimhibrahim assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT jingshanhuang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT dongqili assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT chienhsinglu assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT jeryenyang assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT linaliao assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology
AT mingtan assessmentoflargelanguagemodelsllmsindecisionmakingsupportforgynecologiconcology