Text this: Establishing vocabulary tests as a benchmark for evaluating large language models.