TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
Abstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical te...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-04772-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849772482594078720 |
|---|---|
| author | Zhe Wang Meng Hao Suyuan Peng Yuyan Huang Yiwei Lu Keyu Yao Xiaolin Yang Yan Zhu |
| author_facet | Zhe Wang Meng Hao Suyuan Peng Yuyan Huang Yiwei Lu Keyu Yao Xiaolin Yang Yan Zhu |
| author_sort | Zhe Wang |
| collection | DOAJ |
| description | Abstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license. |
| format | Article |
| id | doaj-art-2d35bc38b2aa4f0e844e9d84d5cb7238 |
| institution | DOAJ |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-2d35bc38b2aa4f0e844e9d84d5cb72382025-08-20T03:02:18ZengNature PortfolioScientific Data2052-44632025-03-0112111010.1038/s41597-025-04772-9TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicineZhe Wang0Meng Hao1Suyuan Peng2Yuyan Huang3Yiwei Lu4Keyu Yao5Xiaolin Yang6Yan Zhu7Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences; School of Basic Medicine, Peking Union Medical CollegeInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Basic Theory for Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Basic Medical Sciences, Chinese Academy of Medical Sciences; School of Basic Medicine, Peking Union Medical CollegeInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesAbstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.https://doi.org/10.1038/s41597-025-04772-9 |
| spellingShingle | Zhe Wang Meng Hao Suyuan Peng Yuyan Huang Yiwei Lu Keyu Yao Xiaolin Yang Yan Zhu TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine Scientific Data |
| title | TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine |
| title_full | TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine |
| title_fullStr | TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine |
| title_full_unstemmed | TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine |
| title_short | TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine |
| title_sort | tcmeval sdt a benchmark dataset for syndrome differentiation thought of traditional chinese medicine |
| url | https://doi.org/10.1038/s41597-025-04772-9 |
| work_keys_str_mv | AT zhewang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT menghao tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT suyuanpeng tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT yuyanhuang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT yiweilu tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT keyuyao tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT xiaolinyang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine AT yanzhu tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine |