ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search
Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10839752/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832584010178494464 |
---|---|
author | Guido Rocchietti Cosimo Rulli Franco Maria Nardini Cristina Ioana Muntean Raffaele Perego Ophir Frieder |
author_facet | Guido Rocchietti Cosimo Rulli Franco Maria Nardini Cristina Ioana Muntean Raffaele Perego Ophir Frieder |
author_sort | Guido Rocchietti |
collection | DOAJ |
description | Large Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches — including zero, one, and few-shot methods — with ChatGPT (e.g., <monospace>gpt-3.5-turbo</monospace>). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than <monospace>gpt-3.5-turbo</monospace>. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both <monospace>gpt-3.5-turbo</monospace> and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques. |
format | Article |
id | doaj-art-ffbc47f4af534a7ea6696b56680bf11f |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-ffbc47f4af534a7ea6696b56680bf11f2025-01-28T00:01:32ZengIEEEIEEE Access2169-35362025-01-0113152531527110.1109/ACCESS.2025.352974110839752ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational SearchGuido Rocchietti0https://orcid.org/0009-0004-9704-0662Cosimo Rulli1Franco Maria Nardini2https://orcid.org/0000-0003-3183-334XCristina Ioana Muntean3https://orcid.org/0000-0001-5265-1831Raffaele Perego4https://orcid.org/0000-0001-7189-4724Ophir Frieder5https://orcid.org/0000-0001-5076-8171ISTI-CNR, Pisa, ItalyISTI-CNR, Pisa, ItalyISTI-CNR, Pisa, ItalyISTI-CNR, Pisa, ItalyISTI-CNR, Pisa, ItalyGeorgetown University, Washington, DC, USALarge Language Models (LLMs) are effective in modeling text syntactic and semantic content, making them a strong choice to perform conversational query rewriting. While previous approaches proposed NLP-based custom models, requiring significant engineering effort, our approach is straightforward and conceptually simpler. Not only do we improve effectiveness over the current state-of-the-art, but we also curate the cost and efficiency aspects. We explore the use of pre-trained LLMs fine-tuned to generate quality user query rewrites, aiming to reduce computational costs while maintaining or improving retrieval effectiveness. As a first contribution, we study various prompting approaches — including zero, one, and few-shot methods — with ChatGPT (e.g., <monospace>gpt-3.5-turbo</monospace>). We observe an increase in the quality of rewrites leading to improved retrieval. We then fine-tuned smaller open LLMs on the query rewriting task. Our results demonstrate that our fine-tuned models, including the smallest with 780 million parameters, achieve better performance during the retrieval phase than <monospace>gpt-3.5-turbo</monospace>. To fine-tune the selected models, we used the QReCC dataset, which is specifically designed for query rewriting tasks. For evaluation, we used the TREC CAsT datasets to assess the retrieval effectiveness of the rewrites of both <monospace>gpt-3.5-turbo</monospace> and our fine-tuned models. Our findings show that fine-tuning LLMs on conversational query rewriting datasets can be more effective than relying on generic instruction-tuned models or traditional query reformulation techniques.https://ieeexplore.ieee.org/document/10839752/Conversational searchquery rewritinglarge language modelsinstruction-tuned LLMsfine-tuning |
spellingShingle | Guido Rocchietti Cosimo Rulli Franco Maria Nardini Cristina Ioana Muntean Raffaele Perego Ophir Frieder ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search IEEE Access Conversational search query rewriting large language models instruction-tuned LLMs fine-tuning |
title | ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search |
title_full | ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search |
title_fullStr | ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search |
title_full_unstemmed | ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search |
title_short | ChatGPT Versus Modest Large Language Models: An Extensive Study on Benefits and Drawbacks for Conversational Search |
title_sort | chatgpt versus modest large language models an extensive study on benefits and drawbacks for conversational search |
topic | Conversational search query rewriting large language models instruction-tuned LLMs fine-tuning |
url | https://ieeexplore.ieee.org/document/10839752/ |
work_keys_str_mv | AT guidorocchietti chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch AT cosimorulli chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch AT francomarianardini chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch AT cristinaioanamuntean chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch AT raffaeleperego chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch AT ophirfrieder chatgptversusmodestlargelanguagemodelsanextensivestudyonbenefitsanddrawbacksforconversationalsearch |