Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation

<b>Background/Objectives</b>: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM’s capability and trustworthiness...

Full description

Saved in:
Bibliographic Details
Main Authors: Sami Barrit, Nathan Torcida, Aurelien Mazeraud, Sebastien Boulogne, Jeanne Benoit, Timothée Carette, Thibault Carron, Bertil Delsaut, Eva Diab, Hugo Kermorvant, Adil Maarouf, Sofia Maldonado Slootjes, Sylvain Redon, Alexis Robin, Sofiene Hadidane, Vincent Harlay, Vito Tota, Tanguy Madec, Alexandre Niset, Mejdeddine Al Barajraji, Joseph R. Madsen, Salim El Hadwe, Nicolas Massager, Stanislas Lagarde, Romain Carron
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Brain Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3425/15/4/347
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850183211207884800
author Sami Barrit
Nathan Torcida
Aurelien Mazeraud
Sebastien Boulogne
Jeanne Benoit
Timothée Carette
Thibault Carron
Bertil Delsaut
Eva Diab
Hugo Kermorvant
Adil Maarouf
Sofia Maldonado Slootjes
Sylvain Redon
Alexis Robin
Sofiene Hadidane
Vincent Harlay
Vito Tota
Tanguy Madec
Alexandre Niset
Mejdeddine Al Barajraji
Joseph R. Madsen
Salim El Hadwe
Nicolas Massager
Stanislas Lagarde
Romain Carron
author_facet Sami Barrit
Nathan Torcida
Aurelien Mazeraud
Sebastien Boulogne
Jeanne Benoit
Timothée Carette
Thibault Carron
Bertil Delsaut
Eva Diab
Hugo Kermorvant
Adil Maarouf
Sofia Maldonado Slootjes
Sylvain Redon
Alexis Robin
Sofiene Hadidane
Vincent Harlay
Vito Tota
Tanguy Madec
Alexandre Niset
Mejdeddine Al Barajraji
Joseph R. Madsen
Salim El Hadwe
Nicolas Massager
Stanislas Lagarde
Romain Carron
author_sort Sami Barrit
collection DOAJ
description <b>Background/Objectives</b>: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM’s capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. <b>Methods</b>: We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating “long-term memory” and “short-term memory” components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. <b>Results</b>: AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, <i>p</i> < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists’ average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. <b>Conclusions</b>: A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.
format Article
id doaj-art-bd3887af8fda424b9eb20bd3282c0388
institution OA Journals
issn 2076-3425
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Brain Sciences
spelling doaj-art-bd3887af8fda424b9eb20bd3282c03882025-08-20T02:17:25ZengMDPI AGBrain Sciences2076-34252025-03-0115434710.3390/brainsci15040347Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based EvaluationSami Barrit0Nathan Torcida1Aurelien Mazeraud2Sebastien Boulogne3Jeanne Benoit4Timothée Carette5Thibault Carron6Bertil Delsaut7Eva Diab8Hugo Kermorvant9Adil Maarouf10Sofia Maldonado Slootjes11Sylvain Redon12Alexis Robin13Sofiene Hadidane14Vincent Harlay15Vito Tota16Tanguy Madec17Alexandre Niset18Mejdeddine Al Barajraji19Joseph R. Madsen20Salim El Hadwe21Nicolas Massager22Stanislas Lagarde23Romain Carron24Neurosurgery, Université Libre de Bruxelles, 1070 Brussels, BelgiumSciense, New York, NY 10013, USAAnesthésie-Réanimation, GHU Paris, Pôle Neuro, 75014 Paris, FranceNeurophysiology and Epileptology, Universite de Lyon, 69007 Lyon, FranceNeurology, CHU de Nice, Université Côte d’Azur, UMR2CA, 06000 Nice, FranceNeurology, Université Catholique de Louvain, Clinique Saint-Pierre Ottignies, 1348 Louvain-la-Neuve, BelgiumLIP6, CNRS, Sorbonne Université, 75005 Paris, FranceNeurology, Université Libre de Bruxelles, 1050 Brussels, BelgiumClinical Neurophysiology, CHU Amiens Picardie, CHIMERE UR 7516 UPJV, 80054 Amiens, FranceNeurophy Lab, Université Libre de Bruxelles, 1050 Brussels, BelgiumNeurology, La Timone Hospital, AP-HM, 13385 Marseille, FranceDepartment of Neurology, Universitair Ziekenhuis Brussel (UZ Brussel), 1090 Brussels, BelgiumEvaluation and Treatment of Pain, FHU INOVPAIN, La Timone Hospital, AP-HM, 13385 Marseille, FranceNeurology, CHU Grenoble, 38700 Grenoble, FranceCabinets de Neurologie d’Allauch et Plan de Cuques, 13190 Allauch, FranceNeuro-Oncology, AMU, La Timone Hospital, AP-HM, 13005 Marseille, FranceNeurology, CHU Helora, 7000 Mons, BelgiumNeurology, Hospital of Noumea, 98800 Nouméa, FranceSciense, New York, NY 10013, USASciense, New York, NY 10013, USANeurodynamics Laboratory, Department of Neurosurgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USANeurosurgery, Université Libre de Bruxelles, 1070 Brussels, BelgiumNeurosurgery, Université Libre de Bruxelles, 1070 Brussels, BelgiumAMU, INSERM, Institut Neuroscience des Systèmes (INS), 13005 Marseille, FranceSciense, New York, NY 10013, USA<b>Background/Objectives</b>: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM’s capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. <b>Methods</b>: We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating “long-term memory” and “short-term memory” components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. <b>Results</b>: AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, <i>p</i> < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists’ average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. <b>Conclusions</b>: A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.https://www.mdpi.com/2076-3425/15/4/347artificial intelligencelarge language modelsneurological diagnosisclinical decision support
spellingShingle Sami Barrit
Nathan Torcida
Aurelien Mazeraud
Sebastien Boulogne
Jeanne Benoit
Timothée Carette
Thibault Carron
Bertil Delsaut
Eva Diab
Hugo Kermorvant
Adil Maarouf
Sofia Maldonado Slootjes
Sylvain Redon
Alexis Robin
Sofiene Hadidane
Vincent Harlay
Vito Tota
Tanguy Madec
Alexandre Niset
Mejdeddine Al Barajraji
Joseph R. Madsen
Salim El Hadwe
Nicolas Massager
Stanislas Lagarde
Romain Carron
Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
Brain Sciences
artificial intelligence
large language models
neurological diagnosis
clinical decision support
title Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
title_full Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
title_fullStr Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
title_full_unstemmed Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
title_short Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation
title_sort specialized large language model outperforms neurologists at complex diagnosis in blinded case based evaluation
topic artificial intelligence
large language models
neurological diagnosis
clinical decision support
url https://www.mdpi.com/2076-3425/15/4/347
work_keys_str_mv AT samibarrit specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT nathantorcida specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT aurelienmazeraud specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT sebastienboulogne specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT jeannebenoit specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT timotheecarette specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT thibaultcarron specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT bertildelsaut specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT evadiab specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT hugokermorvant specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT adilmaarouf specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT sofiamaldonadoslootjes specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT sylvainredon specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT alexisrobin specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT sofienehadidane specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT vincentharlay specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT vitotota specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT tanguymadec specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT alexandreniset specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT mejdeddinealbarajraji specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT josephrmadsen specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT salimelhadwe specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT nicolasmassager specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT stanislaslagarde specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation
AT romaincarron specializedlargelanguagemodeloutperformsneurologistsatcomplexdiagnosisinblindedcasebasedevaluation