Text this: Analyzing evaluation methods for large language models in the medical field: a scoping review