Evaluating search engines and large language models for answering health questions
Abstract Search engines (SEs) have traditionally been primary tools for information seeking, but the new large language models (LLMs) are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-a...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | npj Digital Medicine |
| Online Access: | https://doi.org/10.1038/s41746-025-01546-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849774691739238400 |
|---|---|
| author | Marcos Fernández-Pichel Juan C. Pichel David E. Losada |
| author_facet | Marcos Fernández-Pichel Juan C. Pichel David E. Losada |
| author_sort | Marcos Fernández-Pichel |
| collection | DOAJ |
| description | Abstract Search engines (SEs) have traditionally been primary tools for information seeking, but the new large language models (LLMs) are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-augmented (RAG) variants in answering 150 health-related questions from the TREC Health Misinformation (HM) Track. Results reveal SEs correctly answer 50–70% of questions, often hindered by many retrieval results not responding to the health question. LLMs deliver higher accuracy, correctly answering about 80% of questions, though their performance is sensitive to input prompts. RAG methods significantly enhance smaller LLMs’ effectiveness, improving accuracy by up to 30% by integrating retrieval evidence. |
| format | Article |
| id | doaj-art-65b18ff1bea74f37bd945bd920a4a120 |
| institution | DOAJ |
| issn | 2398-6352 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | npj Digital Medicine |
| spelling | doaj-art-65b18ff1bea74f37bd945bd920a4a1202025-08-20T03:01:38ZengNature Portfolionpj Digital Medicine2398-63522025-03-018111510.1038/s41746-025-01546-wEvaluating search engines and large language models for answering health questionsMarcos Fernández-Pichel0Juan C. Pichel1David E. Losada2Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de CompostelaCentro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de CompostelaCentro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de CompostelaAbstract Search engines (SEs) have traditionally been primary tools for information seeking, but the new large language models (LLMs) are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-augmented (RAG) variants in answering 150 health-related questions from the TREC Health Misinformation (HM) Track. Results reveal SEs correctly answer 50–70% of questions, often hindered by many retrieval results not responding to the health question. LLMs deliver higher accuracy, correctly answering about 80% of questions, though their performance is sensitive to input prompts. RAG methods significantly enhance smaller LLMs’ effectiveness, improving accuracy by up to 30% by integrating retrieval evidence.https://doi.org/10.1038/s41746-025-01546-w |
| spellingShingle | Marcos Fernández-Pichel Juan C. Pichel David E. Losada Evaluating search engines and large language models for answering health questions npj Digital Medicine |
| title | Evaluating search engines and large language models for answering health questions |
| title_full | Evaluating search engines and large language models for answering health questions |
| title_fullStr | Evaluating search engines and large language models for answering health questions |
| title_full_unstemmed | Evaluating search engines and large language models for answering health questions |
| title_short | Evaluating search engines and large language models for answering health questions |
| title_sort | evaluating search engines and large language models for answering health questions |
| url | https://doi.org/10.1038/s41746-025-01546-w |
| work_keys_str_mv | AT marcosfernandezpichel evaluatingsearchenginesandlargelanguagemodelsforansweringhealthquestions AT juancpichel evaluatingsearchenginesandlargelanguagemodelsforansweringhealthquestions AT davidelosada evaluatingsearchenginesandlargelanguagemodelsforansweringhealthquestions |