TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine

Abstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical te...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhe Wang, Meng Hao, Suyuan Peng, Yuyan Huang, Yiwei Lu, Keyu Yao, Xiaolin Yang, Yan Zhu
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04772-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849772482594078720
author Zhe Wang
Meng Hao
Suyuan Peng
Yuyan Huang
Yiwei Lu
Keyu Yao
Xiaolin Yang
Yan Zhu
author_facet Zhe Wang
Meng Hao
Suyuan Peng
Yuyan Huang
Yiwei Lu
Keyu Yao
Xiaolin Yang
Yan Zhu
author_sort Zhe Wang
collection DOAJ
description Abstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.
format Article
id doaj-art-2d35bc38b2aa4f0e844e9d84d5cb7238
institution DOAJ
issn 2052-4463
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-2d35bc38b2aa4f0e844e9d84d5cb72382025-08-20T03:02:18ZengNature PortfolioScientific Data2052-44632025-03-0112111010.1038/s41597-025-04772-9TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicineZhe Wang0Meng Hao1Suyuan Peng2Yuyan Huang3Yiwei Lu4Keyu Yao5Xiaolin Yang6Yan Zhu7Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences; School of Basic Medicine, Peking Union Medical CollegeInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Basic Theory for Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesInstitute of Basic Medical Sciences, Chinese Academy of Medical Sciences; School of Basic Medicine, Peking Union Medical CollegeInstitute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical SciencesAbstract This paper presents a large publicly available benchmark dataset (TCMEval-SDT) for the thought process involved in syndrome differentiation in traditional Chinese medicine (TCM). The dataset consists of 300 TCM syndrome diagnosis cases sourced from the internet, classical Chinese medical texts, and medical records from hospitals, with metadata adhering to the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Each case has been annotated and curated by TCM experts and includes medical record ID, clinical data, explanatory summary, TCM syndrome, clinical information, and TCM pathogenesis, to support algorithms or models in emulating the diagnostic process of TCM clinicians. To provide a comprehensive description of the TCM syndrome diagnosis process, we summarize the diagnosis into four steps: (1) clinical information extraction, (2) TCM pathogenesis reasoning, (3) TCM syndrome reasoning, and (4) explanatory summary. We have also established validation criteria to evaluate their ability in TCM clinical diagnosis using this dataset. To facilitate research and evaluation in syndrome diagnosis of TCM, the TCMEval-SDT dataset is made publicly available under the CC-BY 4.0 license.https://doi.org/10.1038/s41597-025-04772-9
spellingShingle Zhe Wang
Meng Hao
Suyuan Peng
Yuyan Huang
Yiwei Lu
Keyu Yao
Xiaolin Yang
Yan Zhu
TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
Scientific Data
title TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
title_full TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
title_fullStr TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
title_full_unstemmed TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
title_short TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine
title_sort tcmeval sdt a benchmark dataset for syndrome differentiation thought of traditional chinese medicine
url https://doi.org/10.1038/s41597-025-04772-9
work_keys_str_mv AT zhewang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT menghao tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT suyuanpeng tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT yuyanhuang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT yiweilu tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT keyuyao tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT xiaolinyang tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine
AT yanzhu tcmevalsdtabenchmarkdatasetforsyndromedifferentiationthoughtoftraditionalchinesemedicine