Establishing vocabulary tests as a benchmark for evaluating large language models.
Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect th...
Saved in:
| Main Authors: | Gonzalo Martínez, Javier Conde, Elena Merino-Gómez, Beatriz Bermúdez-Margaretto, José Alberto Hernández, Pedro Reviriego, Marc Brysbaert |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2024-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0308259 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Pedro Reviriego, et al.
Published: (2024-12-01) -
Applying Mixed-Effects Models in Research on Second Language Acquisition: A Tutorial for Beginners
by: Marc Brysbaert
Published: (2025-01-01) -
Benchmarking Large Language Models for News Summarization
by: Tianyi Zhang, et al.
Published: (2024-02-01) -
Benchmarking of Large Language Models for the Dental Admission Test
by: Yu Hou, et al.
Published: (2025-01-01) -
Benchmarking large language models for biomedical natural language processing applications and recommendations
by: Qingyu Chen, et al.
Published: (2025-04-01)