Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

AbstractRigorous evaluation of generalist medical artificial intelligence (GMAI) is imperative to ensure their utility and safety before implementation in health care. Current evaluation strategies rely heavily on benchmarks, which can suffer from issues with data contamination and cannot...

Full description

Saved in:

Bibliographic Details
Main Authors:	Luning Sun, Christopher Gibbons, José Hernández-Orallo, Xiting Wang, Liming Jiang, David Stillwell, Fang Luo, Xing Xie
Format:	Article
Language:	English
Published:	JMIR Publications 2025-05-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2025/1/e70901
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Era of Generalist Conversational Artificial Intelligence to Support Public Health Communications
by: Emre Sezgin, et al.
Published: (2025-01-01)

Lumbar puncture for the generalist
by: J.M. Boon, et al.
Published: (2004-03-01)

The Social Generalist's Dilemma
by: Melvyn L. Fein
Published: (2016-03-01)

The Specialist’s Paradox: Generalist AI May Better Organize Medical Knowledge
by: Carlo Galli, et al.
Published: (2025-07-01)

Managing dyspepsia as a generalist
by: Min Yi Martin Soo, et al.
Published: (2025-07-01)

Paediatric palliative care for the generalist
by: Julia F. Ambler, et al.
Published: (2023-04-01)

A perspective for adapting generalist AI to specialized medical AI applications and their challenges
by: Zifeng Wang, et al.
Published: (2025-07-01)

Intelligent optimization algorithm based on benchmarking
by: Anshi XIE
Published: (2018-07-01)

Theatre and emergency services rendered by generalist medical practitioners in district hospitals in the Western Cape
by: M.R. de Villiers, et al.
Published: (2003-07-01)

Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data
by: Chaoyi Wu, et al.
Published: (2025-08-01)

Generalist medical foundation model improves prostate cancer segmentation from multimodal MRI images
by: Yuhan Zhang, et al.
Published: (2025-06-01)

Development and psychometric evaluation of the artificial intelligence attitude scale for nurses
by: Tuğba Öztürk Yıldırım, et al.
Published: (2025-04-01)

A Generalist Approach in Social Work Education in Turkey
by: Işıl Bulut
Published: (2003-01-01)

A study on the growth of generalist iterated entire functions
by: Ratan Kumar Dutta
Published: (2020-12-01)

Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists
by: John Michael Hoppe, et al.
Published: (2025-07-01)

Individual Specialization in a Generalist Apex Predator: The Leopard Seal
by: Emily S. Sperou, et al.
Published: (2025-06-01)

Stability analysis in generalist predator-prey dynamics with predator harvesting
by: S. Vijaya, et al.
Published: (2024-12-01)

Psychometric assessment of the Persian translated version of the "medical artificial intlligence readiness scale for medical students".
by: Nasrin Khajeali, et al.
Published: (2025-01-01)

Retinal revelations: Seeing beyond the eye with artificial intelligence
by: John Davis Akkara
Published: (2024-12-01)

Artificial intelligence is transforming the study of proteins: Structures and beyond
by: Haiyan Liu, et al.
Published: (2025-04-01)

Beyond human-in-the-loop: Sensemaking between artificial intelligence and human intelligence collaboration
by: Xinyue Hao, et al.
Published: (2025-12-01)

Light-driven phenotypic plasticity in the depth-generalist coral, Pavona varians.
by: Claire J Lewis, et al.
Published: (2025-01-01)

Consequences of “zombie-making” and generalist fungal pathogens on carpenter ant microbiota
by: Sophia Vermeulen, et al.
Published: (2025-01-01)

Patients’ perspectives of epilepsy care by specialists and generalists: qualitative evidence synthesis
by: Charlotte L Cotterill, et al.
Published: (2024-12-01)

Medical triage as an AI ethics benchmark
by: Nathalie Maria Kirch, et al.
Published: (2025-08-01)

Artificial Intelligence in Medical Education
by: Rajendra B. Nerli, et al.
Published: (2025-04-01)

Artificial intelligence in medical imaging
by: Bin Huang, et al.
Published: (2024-12-01)

Different artificial lighting spectra changes the mating behavior of the generalist predator Orius insidiosus (Say), and photoperiod extension promotes its development
by: Morgane L. Canovas, et al.
Published: (2025-08-01)

Artificial intelligence accelerates the identification of nature-derived potent LOXL2 inhibitors
by: Xiaowei Jia, et al.
Published: (2025-03-01)

Profile and generalist physician knowledge about neurology in emergency department: headache management
by: Maren de MORAES E SILVA, et al.

Beyond benchmarks: Exploring views of success among adult basic education learners
by: Shannon Frey
Published: (2024-08-01)

Thermal sensitivity and niche plasticity of generalist and specialist leaf-endophytic bacteria in Mangrove Kandelia obovata
by: Rajapakshalage Thashikala Nethmini, et al.
Published: (2025-01-01)

Artificial Intelligence in Revolutionizing Kidney Care and Beyond: Kid-AI Revolution
by: Kounaina Khan, et al.
Published: (2024-01-01)

Multimodal Artificial Intelligence in Medical Diagnostics
by: Bassem Jandoubi, et al.
Published: (2025-07-01)

Role of Artificial Intelligence in Medical Microbiology
by: Shazia Khan, et al.
Published: (2025-05-01)

The General Attitudes towards Artificial Intelligence Scale (GAAIS): validation and psychometric properties analysis in the Italian context
by: Lavinia Cicero, et al.
Published: (2025-07-01)

Psychometric properties and Turkish adaptation of the artificial intelligence attitude scale (AIAS-4): evidence for construct validity
by: Seydi Ahmet Satici, et al.
Published: (2025-03-01)

APPLICATION OF TECHNOLOGY BENCHMARKING (CORPORATE INTELLIGENCE) IN THE MANAGEMENT OF ORGANIZATIONS
by: Alexander L. Barannikov, et al.
Published: (2016-08-01)

Flower Constancy in the Generalist Pollinator Ceratina flavipes (Hymenoptera: Apidae): An Evaluation by Pollen Analysis
by: Midori Kobayashi-Kidokoro, et al.
Published: (2010-01-01)

Multimodal GNSS-R self-supervised learning as a generalist Earth surface monitor
by: Daixin Zhao, et al.
Published: (2025-08-01)