MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
Ensuring the general efficacy and benefit for human beings from medical Large Language Models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we in...
Saved in:
| Main Authors: | Mianxin Liu, Weiguo Hu, Jinru Ding, Jie Xu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tsinghua University Press
2024-12-01
|
| Series: | Big Data Mining and Analytics |
| Subjects: | |
| Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020044 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
A Perspective on Quality Evaluation for AI-Generated Videos
by: Zhichao Zhang, et al.
Published: (2025-07-01) -
Benchmarking spatial transcriptomics technologies with the multi-sample SpatialBenchVisium dataset
by: Mei R. M. Du, et al.
Published: (2025-03-01) -
BenchMake: turn any scientific data set into a reproducible benchmark
by: A S Barnard
Published: (2025-01-01) -
LLM4Mat-bench: benchmarking large language models for materials property prediction
by: Andre Niyongabo Rubungo, et al.
Published: (2025-01-01) -
A Comprehensive Cross-Model Framework for Benchmarking the Performance of Quantum Hamiltonian Simulations
by: Avimita Chatterjee, et al.
Published: (2025-01-01)