Establishing vocabulary tests as a benchmark for evaluating large language models.

Establishing vocabulary tests as a benchmark for evaluating large language models.

Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Gonzalo Martínez, Javier Conde, Elena Merino-Gómez, Beatriz Bermúdez-Margaretto, José Alberto Hernández, Pedro Reviriego, Marc Brysbaert
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2024-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0308259
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Pedro Reviriego, et al.
Published: (2024-12-01)

Applying Mixed-Effects Models in Research on Second Language Acquisition: A Tutorial for Beginners
by: Marc Brysbaert
Published: (2025-01-01)

Benchmarking Large Language Models for News Summarization
by: Tianyi Zhang, et al.
Published: (2024-02-01)

Benchmarking of Large Language Models for the Dental Admission Test
by: Yu Hou, et al.
Published: (2025-01-01)

Benchmarking large language models for biomedical natural language processing applications and recommendations
by: Qingyu Chen, et al.
Published: (2025-04-01)

ESTABLISHING A METHODOLOGY FOR BENCHMARKING SPEECH SYNTHESIS FOR COMPUTER-ASSISTED LANGUAGE LEARNING (CALL)
by: Zöe Handley, et al.
Published: (2005-09-01)

Benchmarking open-source large language models on Portuguese Revalida multiple-choice questions
by: João Victor Bruneti Severino, et al.
Published: (2025-02-01)

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Haodong Li, et al.
Published: (2025-02-01)

Towards a benchmark dataset for large language models in the context of process automation
by: Tejennour Tizaoui, et al.
Published: (2024-12-01)

Using Large Language Models for Aerospace Code Generation: Methods, Benchmarks, and Potential Values
by: Rui He, et al.
Published: (2025-05-01)

SHIELD: an evaluation benchmark for face spoofing and forgery detection with multimodal large language models
by: Yichen Shi, et al.
Published: (2025-06-01)

Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets
by: Arbi Haza Nasution, et al.
Published: (2025-01-01)

APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering
by: Di Wu, et al.
Published: (2025-03-01)

Benchmarking Large Language Models in Evaluating Workforce Risk of Robotization: Insights from Agriculture
by: Lefteris Benos, et al.
Published: (2025-04-01)

LLM4Mat-bench: benchmarking large language models for materials property prediction
by: Andre Niyongabo Rubungo, et al.
Published: (2025-01-01)

Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models
by: Jie Wu, et al.
Published: (2025-04-01)

Benchmarking OpenAI's APIs and Large Language Models for Repeatable, Efficient Question Answering Across Multiple Documents
by: Elena Filipovska, et al.
Published: (2024-10-01)

Benchmarking Multiple Large Language Models for Automated Clinical Trial Data Extraction in Aging Research
by: Richard J. Young, et al.
Published: (2025-05-01)

Benchmarking 21 Open-Source Large Language Models for Phishing Link Detection with Prompt Engineering
by: Arbi Haza Nasution, et al.
Published: (2025-04-01)

Evaluating large language models on geospatial tasks: a multiple geospatial task benchmarking study
by: Liuchang Xu, et al.
Published: (2025-08-01)

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study
by: Mahmud Omar, et al.
Published: (2025-05-01)

Irony in the emotive vocabulary of the Kabardian-Circassian language
by: Мadina H. Tokmakova, et al.
Published: (2024-03-01)

Profiles of early expressive vocabulary in children with typical and atypical language development
by: Alejandra Auza-Benavides, et al.
Published: (2024-12-01)

Establishing topological benchmarks for three-dimensional x-ray diffraction microscopy
by: A. J. Shahani, et al.
Published: (2025-07-01)

ARL Physics Web Pages: An Evaluation by Established, Transitional and Emerging Benchmarks
by: Jane C. Duffy
Published: (2002-12-01)

Enhancing Large Language Model Comprehension of Material Phase Diagrams through Prompt Engineering and Benchmark Datasets
by: Yang Zha, et al.
Published: (2024-10-01)

Large Language Model-Driven Structured Output: A Comprehensive Benchmark and Spatial Data Generation Framework
by: Diya Li, et al.
Published: (2024-11-01)

Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models
by: Dilara Kizilkaya, et al.
Published: (2025-01-01)

INSTRUCTION OF VOCABULARY AT THE HIGHER SECONDARY LEVEL IN BANGLADESH: INTEGRATION OF COMMUNICATIVE LANGUAGE TEACHING (CLT) WITH VOCABULARY TASKS
by: Sharmin Rahman Bipasha
Published: (2025-03-01)

TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines
by: Sadam Hussain, et al.
Published: (2024-10-01)

Large vocabulary continuous speech recognition system for Polish
by: K. Marasek
Published: (2003-01-01)

Vocabulary Size of University of Aden English Language Students
by: Abdulnaser Mohammed Ali Naqeeb
Published: (2021-04-01)

Russian sign language: Main problems of vocabulary study
by: A.A. Komarova
Published: (2022-04-01)

Vocabulary in a second language: Selection, acquisition, and testing: A commentary on four studies for JALT vocabulary SIG
by: Batia Laufer
Published: (2014-12-01)

Assessment of upper limb motor control: establishing normative benchmarks for clinical applications
by: Pablo Martín-Sierra, et al.
Published: (2025-08-01)

Analysis of the vocabulary of the «nursery language» in multilevel idioms (based on the material of the Rutul language)
by: M.O. Ibragimova
Published: (2025-06-01)

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
by: Meng Fang, et al.
Published: (2025-08-01)

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
by: Mianxin Liu, et al.
Published: (2024-12-01)

The acquisition of foreign language vocabulary: Does spacing effect matter?
by: F. M. Al-Khasawneh
Published: (2023-03-01)

The Effectiveness of Mobile Language Learning Applications (MLLA) for Vocabulary Acquisition
by: Muflikhah Ulya, et al.
Published: (2025-05-01)