MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

Ensuring the general efficacy and benefit for human beings from medical Large Language Models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mianxin Liu, Weiguo Hu, Jinru Ding, Jie Xu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang
Format:	Article
Language:	English
Published:	Tsinghua University Press 2024-12-01
Series:	Big Data Mining and Analytics
Subjects:	medical large language model (mllm) benchmark platform open-source
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2024.9020044
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Perspective on Quality Evaluation for AI-Generated Videos
by: Zhichao Zhang, et al.
Published: (2025-07-01)

Benchmarking spatial transcriptomics technologies with the multi-sample SpatialBenchVisium dataset
by: Mei R. M. Du, et al.
Published: (2025-03-01)

BenchMake: turn any scientific data set into a reproducible benchmark
by: A S Barnard
Published: (2025-01-01)

LLM4Mat-bench: benchmarking large language models for materials property prediction
by: Andre Niyongabo Rubungo, et al.
Published: (2025-01-01)

A Comprehensive Cross-Model Framework for Benchmarking the Performance of Quantum Hamiltonian Simulations
by: Avimita Chatterjee, et al.
Published: (2025-01-01)

Scalable Full-Stack Benchmarks for Quantum Computers
by: Jordan Hines, et al.
Published: (2024-01-01)

Framework for key benchmarking indicators in hospital information system
by: Asghar Ehteshami, et al.
Published: (2025-07-01)

Benchmark Ratio of Insurance Penetration (BRIP) as a New Relative Measure of Insurance Development and Benchmarks in Insurance
by: Agnieszka Pobłocka
Published: (2024-10-01)

BENCHMARKING METHODOLOGY FOR INCREASING OF ENERGY EFFICIENCY OF INDUSTRY OF UKRAINE
by: Rosen V.P., et al.
Published: (2012-08-01)

Identificação de benchmarks e anti-benchmarks para companhias aéreas usando modelos DEA e fronteira invertida Use of DEA and inverted frontier for airlines benchmarking and anti-benchmarking identification
by: Juliana Quintanilha da Silveira, et al.
Published: (2012-12-01)

Benchmarking Innovation in Europe
by: Giulio Perani, et al.
Published: (2008-03-01)

Análisis de las metodologías de benchmarking genéricas y aplicables al deporte (Analysis of generic benchmarking methodologies applicable to sports)
by: Manuel de Jesús Cortina Núñez, et al.
Published: (2024-03-01)

THE BENCHMARK ANALYSIS OF TOURISM IN ROMANIA AND JAPAN
by: Georgiana BADICU, et al.
Published: (2018-12-01)

IceBench: A Benchmark for Deep-Learning-Based Sea-Ice Type Classification
by: Samira Alkaee Taleghan, et al.
Published: (2025-05-01)

THE IMPORTANCE OF BENCHMARKING IN MAKING MANAGEMENT DECISIONS
by: Adriana-Mihaela IONESCU, et al.
Published: (2016-06-01)

Excellence in Arts Education: A Benchmark Research
by: Mariska Versantvoort
Published: (2019-11-01)

Benchmarking dalam Manajemen Sebuah Perpustakaan
by: Eke Wince
Published: (2018-06-01)

ENVIRONMENTAL BENCHMARKING FOR LOCAL AUTHORITIES
by: Marinela GHEREŞ, et al.
Published: (2010-01-01)

Optimizing Operating Room Efficiency for Primary Hip and Knee Arthroplasty Using Performance Benchmarks
by: Koorosh Kashanian, BMSc, et al.
Published: (2025-02-01)

Benchmarking offshore drilling: methodology and case study
by: Dalmo S. Amorim Jr., et al.
Published: (2025-07-01)

Stakeholder approach in benchmark of corporate risk management
by: V. M. Makarova
Published: (2017-10-01)

Three prerequisites of efficiency in the economy: how to surpass competitors through benchmarking
by: A. E. Tomaily, et al.
Published: (2025-05-01)

Updated benchmarking of variant effect predictors using deep mutational scanning
by: Benjamin J Livesey, et al.
Published: (2023-06-01)

Fields2Benchmark: An open-source benchmark for coverage path planning methods in agriculture
by: Gonzalo Mier, et al.
Published: (2025-12-01)

A Practical Probabilistic Benchmark for AI Weather Models
by: Noah D. Brenowitz, et al.
Published: (2025-04-01)

STRATEGI KOMUNIKASI PEMASARAN CLEANS DENGAN METODE BENCHMARKING
by: Edwin Lauwis, et al.
Published: (2019-02-01)

Innovative Tools of Educational Organization Development Management: Benchmarking
by: E. O. Okunkova
Published: (2020-12-01)

Benchmarking Methods for Pointwise Reliability
by: Cláudio Correia, et al.
Published: (2025-04-01)

Benchmarking Consistency Levels of Cloud-Distributed NoSQL Databases Using YCSB
by: Saulo Ferreira, et al.
Published: (2025-01-01)

IMPROVEMENT OF ACCOUNTING BASED ON VARIOUS BENCHMARKING TYPES
by: Liliya N. Kuznetsova
Published: (2012-02-01)

BENCHMARKING, OPPORTUNITIES AND DISPUTES REGARDING APPLICATION IN LOCAL PUBLIC ADMINISTRATION
by: FLORIN CAZACU
Published: (2020-10-01)

Benchmarking pangenome dynamics and horizontal gene transfer in Mycobacterium marinum evolution
by: Khandker Shahed, et al.
Published: (2025-06-01)

Resignifying pedagogical renewal today: Pedagogical benchmarks and singularities in secondary schools
by: Ana de Castro-Calvo, et al.
Published: (2024-06-01)

Science Communication Practices in UNESCO Global Geoparks: A Benchmark Analysis
by: Joana Rodrigues, et al.
Published: (2025-02-01)

Application of an Integrated Workmanship Benchmarking Framework to Building Failure in Developing Countries
by: Rakesh Sookoo, et al.
Published: (2025-06-01)

Benchmarking foundation cell models for post-perturbation RNA-seq prediction
by: Gerold Csendes, et al.
Published: (2025-04-01)

OpenClustered: an R package with a benchmark suite of clustered datasets for methodological evaluation and comparison
by: Nathaniel Sean O’Connell, et al.
Published: (2025-04-01)

How sustainable are cultural organizations? A global benchmark
by: Martin Müller, et al.
Published: (2024-12-01)

Optimal Investment Based on Performance Measure and Stochastic Benchmark Under PI and Position Constraints
by: Chengzhe Wang, et al.
Published: (2025-06-01)

Indicators System Creation For The Energy Efficiency Benchmarking Of Municipal Power System Facilities
by: Davydenko L.V.
Published: (2015-04-01)