Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study
Abstract BackgroundBreast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-06-01
|
| Series: | JMIR Medical Informatics |
| Online Access: | https://medinform.jmir.org/2025/1/e70924 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849688124116959232 |
|---|---|
| author | Su Miaojiao Liang Xia Zeng Xian Tao Hong Zhi Liang Cheng Sheng Wu Songsong |
| author_facet | Su Miaojiao Liang Xia Zeng Xian Tao Hong Zhi Liang Cheng Sheng Wu Songsong |
| author_sort | Su Miaojiao |
| collection | DOAJ |
| description |
Abstract
BackgroundBreast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language models (LLMs) like ChatGPT-4 have shown potential in medical imaging interpretation. This study explores its feasibility in improving BI-RADS classification consistency and malignancy prediction compared to radiologists.
ObjectiveThis study aims to evaluate the feasibility of using LLMs, particularly ChatGPT-4, to assess the consistency and diagnostic accuracy of standardized breast ultrasound imaging reports, using pathology as the reference standard.
MethodsThis retrospective study analyzed breast nodule ultrasound data from 671 female patients (mean 45.82, SD 9.20 years; range 26‐75 years) who underwent biopsy or surgical excision at our hospital between June 2019 and June 2024. ChatGPT-4 was used to interpret BI-RADS classifications and predict benign versus malignant nodules. The study compared the model’s performance to that of two senior radiologists (≥15 years of experience) and two junior radiologists (<5 years of experience) using key diagnostic metrics, including accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, P
ResultsChatGPT-4 achieved an overall BI-RADS classification accuracy of 96.87%, outperforming junior radiologists (617/671, 91.95% and 604/671, 90.01%, PPPP
ConclusionsIntegrating ChatGPT-4 into an image-to-text–LLM workflow improves BI-RADS classification accuracy and supports radiologists in breast ultrasound diagnostics. These results demonstrate its potential as a decision-support tool to enhance diagnostic consistency and reduce variability. |
| format | Article |
| id | doaj-art-e30849d533a245199f8ddfe9fa480d24 |
| institution | DOAJ |
| issn | 2291-9694 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Medical Informatics |
| spelling | doaj-art-e30849d533a245199f8ddfe9fa480d242025-08-20T03:22:07ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-06-0113e70924e7092410.2196/70924Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective StudySu Miaojiaohttp://orcid.org/0009-0000-5251-1003Liang Xiahttp://orcid.org/0009-0005-2451-7535Zeng Xian Taohttp://orcid.org/0009-0000-7875-7079Hong Zhi Lianghttp://orcid.org/0000-0001-8995-9995Cheng Shenghttp://orcid.org/0000-0002-4761-6203Wu Songsonghttp://orcid.org/0009-0006-7590-0259 Abstract BackgroundBreast ultrasound is essential for evaluating breast nodules, with Breast Imaging Reporting and Data System (BI-RADS) providing standardized classification. However, interobserver variability among radiologists can affect diagnostic accuracy. Large language models (LLMs) like ChatGPT-4 have shown potential in medical imaging interpretation. This study explores its feasibility in improving BI-RADS classification consistency and malignancy prediction compared to radiologists. ObjectiveThis study aims to evaluate the feasibility of using LLMs, particularly ChatGPT-4, to assess the consistency and diagnostic accuracy of standardized breast ultrasound imaging reports, using pathology as the reference standard. MethodsThis retrospective study analyzed breast nodule ultrasound data from 671 female patients (mean 45.82, SD 9.20 years; range 26‐75 years) who underwent biopsy or surgical excision at our hospital between June 2019 and June 2024. ChatGPT-4 was used to interpret BI-RADS classifications and predict benign versus malignant nodules. The study compared the model’s performance to that of two senior radiologists (≥15 years of experience) and two junior radiologists (<5 years of experience) using key diagnostic metrics, including accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, P ResultsChatGPT-4 achieved an overall BI-RADS classification accuracy of 96.87%, outperforming junior radiologists (617/671, 91.95% and 604/671, 90.01%, PPPP ConclusionsIntegrating ChatGPT-4 into an image-to-text–LLM workflow improves BI-RADS classification accuracy and supports radiologists in breast ultrasound diagnostics. These results demonstrate its potential as a decision-support tool to enhance diagnostic consistency and reduce variability.https://medinform.jmir.org/2025/1/e70924 |
| spellingShingle | Su Miaojiao Liang Xia Zeng Xian Tao Hong Zhi Liang Cheng Sheng Wu Songsong Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study JMIR Medical Informatics |
| title | Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study |
| title_full | Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study |
| title_fullStr | Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study |
| title_full_unstemmed | Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study |
| title_short | Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study |
| title_sort | using a large language model for breast imaging reporting and data system classification and malignancy prediction to enhance breast ultrasound diagnosis retrospective study |
| url | https://medinform.jmir.org/2025/1/e70924 |
| work_keys_str_mv | AT sumiaojiao usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy AT liangxia usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy AT zengxiantao usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy AT hongzhiliang usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy AT chengsheng usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy AT wusongsong usingalargelanguagemodelforbreastimagingreportinganddatasystemclassificationandmalignancypredictiontoenhancebreastultrasounddiagnosisretrospectivestudy |