Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations
Abstract Digital health technologies hold significant potential for reducing global healthcare disparities. Large language models (LLMs) offer new opportunities to enhance access to culturally specific healthcare, including traditional Chinese medicine (TCM). This study evaluated the diagnostic and...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | npj Digital Medicine |
| Online Access: | https://doi.org/10.1038/s41746-025-01845-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342088610578432 |
|---|---|
| author | Yu Liu Yishan Yuan Keming Yan Yuanyuan Li Valeria Sacca Sierra Hodges Mattia Cannistra Pauline Jeong Jiani Wu Jian Kong |
| author_facet | Yu Liu Yishan Yuan Keming Yan Yuanyuan Li Valeria Sacca Sierra Hodges Mattia Cannistra Pauline Jeong Jiani Wu Jian Kong |
| author_sort | Yu Liu |
| collection | DOAJ |
| description | Abstract Digital health technologies hold significant potential for reducing global healthcare disparities. Large language models (LLMs) offer new opportunities to enhance access to culturally specific healthcare, including traditional Chinese medicine (TCM). This study evaluated the diagnostic and treatment performance of seven publicly available LLMs using a real-world acupuncture case, comparing their outputs with three professional acupuncturists across five domains: Western diagnosis, TCM diagnosis, acupoint selection, needling technique, and herbal medicine. Twenty-eight expert evaluators from China, South Korea, and the United States assessed the responses using a multilingual survey. LLMs performed comparably to acupuncturists in Western diagnosis and showed variable performance in TCM-specific tasks. GPT-4o, Qwen 2.5 Max, and Doubao 1.5 Pro demonstrated the highest alignment with expert evaluations, particularly in TCM diagnosis and acupoint selection. These findings highlight the potential of general-purpose LLMs to support culturally grounded medical decision-making and reduce access barriers in TCM care systems. |
| format | Article |
| id | doaj-art-950d30ea6e0e4b05a0562a6766f33d23 |
| institution | Kabale University |
| issn | 2398-6352 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | npj Digital Medicine |
| spelling | doaj-art-950d30ea6e0e4b05a0562a6766f33d232025-08-20T03:43:30ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111210.1038/s41746-025-01845-2Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendationsYu Liu0Yishan Yuan1Keming Yan2Yuanyuan Li3Valeria Sacca4Sierra Hodges5Mattia Cannistra6Pauline Jeong7Jiani Wu8Jian Kong9Department of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolBeijing University of Chinese MedicineDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolDepartment of Psychiatry, Massachusetts General Hospital and Harvard Medical SchoolAbstract Digital health technologies hold significant potential for reducing global healthcare disparities. Large language models (LLMs) offer new opportunities to enhance access to culturally specific healthcare, including traditional Chinese medicine (TCM). This study evaluated the diagnostic and treatment performance of seven publicly available LLMs using a real-world acupuncture case, comparing their outputs with three professional acupuncturists across five domains: Western diagnosis, TCM diagnosis, acupoint selection, needling technique, and herbal medicine. Twenty-eight expert evaluators from China, South Korea, and the United States assessed the responses using a multilingual survey. LLMs performed comparably to acupuncturists in Western diagnosis and showed variable performance in TCM-specific tasks. GPT-4o, Qwen 2.5 Max, and Doubao 1.5 Pro demonstrated the highest alignment with expert evaluations, particularly in TCM diagnosis and acupoint selection. These findings highlight the potential of general-purpose LLMs to support culturally grounded medical decision-making and reduce access barriers in TCM care systems.https://doi.org/10.1038/s41746-025-01845-2 |
| spellingShingle | Yu Liu Yishan Yuan Keming Yan Yuanyuan Li Valeria Sacca Sierra Hodges Mattia Cannistra Pauline Jeong Jiani Wu Jian Kong Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations npj Digital Medicine |
| title | Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations |
| title_full | Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations |
| title_fullStr | Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations |
| title_full_unstemmed | Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations |
| title_short | Evaluating the role of large language models in traditional Chinese medicine diagnosis and treatment recommendations |
| title_sort | evaluating the role of large language models in traditional chinese medicine diagnosis and treatment recommendations |
| url | https://doi.org/10.1038/s41746-025-01845-2 |
| work_keys_str_mv | AT yuliu evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT yishanyuan evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT kemingyan evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT yuanyuanli evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT valeriasacca evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT sierrahodges evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT mattiacannistra evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT paulinejeong evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT jianiwu evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations AT jiankong evaluatingtheroleoflargelanguagemodelsintraditionalchinesemedicinediagnosisandtreatmentrecommendations |