Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.

Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on d...

Full description

Saved in:
Bibliographic Details
Main Authors: Fynn Bachmann, Daan van der Weijden, Lucien Heitz, Cristina Sarasua, Abraham Bernstein
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0322690
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849325377210548224
author Fynn Bachmann
Daan van der Weijden
Lucien Heitz
Cristina Sarasua
Abraham Bernstein
author_facet Fynn Bachmann
Daan van der Weijden
Lucien Heitz
Cristina Sarasua
Abraham Bernstein
author_sort Fynn Bachmann
collection DOAJ
description Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on data to train the model for question selection. Often, such training data (i.e., user interactions) are unavailable a priori. To address this problem, we (i) test whether Large Language Models (LLM) can accurately generate such interaction data and (ii) explore if these synthetic data can be used to pre-train the statistical model of an adaptive political survey. To evaluate this approach, we utilise existing data from the Swiss Voting Advice Application (VAA) Smartvote in two ways: First, we compare the distribution of LLM-generated synthetic data to the real distribution to assess its similarity. Second, we compare the performance of an adaptive questionnaire that is randomly initialised with one pre-trained on synthetic data to assess their suitability for training. We benchmark these results against an "oracle" questionnaire with perfect prior knowledge. We find that an off-the-shelf LLM (GPT-4) accurately generates answers to the Smartvote questionnaire from the perspective of different Swiss parties. Furthermore, we demonstrate that initialising the statistical model with synthetic data can (i) significantly reduce the error in predicting user responses and (ii) increase the candidate recommendation accuracy of the VAA. Our work emphasises the considerable potential of LLMs to create training data to improve the data collection process in adaptive questionnaires in LLM-affine areas such as political surveys.
format Article
id doaj-art-e448a9f660004e91811675fffb8e8e00
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-e448a9f660004e91811675fffb8e8e002025-08-20T03:48:26ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032269010.1371/journal.pone.0322690Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.Fynn BachmannDaan van der WeijdenLucien HeitzCristina SarasuaAbraham BernsteinAdaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on data to train the model for question selection. Often, such training data (i.e., user interactions) are unavailable a priori. To address this problem, we (i) test whether Large Language Models (LLM) can accurately generate such interaction data and (ii) explore if these synthetic data can be used to pre-train the statistical model of an adaptive political survey. To evaluate this approach, we utilise existing data from the Swiss Voting Advice Application (VAA) Smartvote in two ways: First, we compare the distribution of LLM-generated synthetic data to the real distribution to assess its similarity. Second, we compare the performance of an adaptive questionnaire that is randomly initialised with one pre-trained on synthetic data to assess their suitability for training. We benchmark these results against an "oracle" questionnaire with perfect prior knowledge. We find that an off-the-shelf LLM (GPT-4) accurately generates answers to the Smartvote questionnaire from the perspective of different Swiss parties. Furthermore, we demonstrate that initialising the statistical model with synthetic data can (i) significantly reduce the error in predicting user responses and (ii) increase the candidate recommendation accuracy of the VAA. Our work emphasises the considerable potential of LLMs to create training data to improve the data collection process in adaptive questionnaires in LLM-affine areas such as political surveys.https://doi.org/10.1371/journal.pone.0322690
spellingShingle Fynn Bachmann
Daan van der Weijden
Lucien Heitz
Cristina Sarasua
Abraham Bernstein
Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
PLoS ONE
title Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
title_full Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
title_fullStr Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
title_full_unstemmed Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
title_short Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions.
title_sort adaptive political surveys and gpt 4 tackling the cold start problem with simulated user interactions
url https://doi.org/10.1371/journal.pone.0322690
work_keys_str_mv AT fynnbachmann adaptivepoliticalsurveysandgpt4tacklingthecoldstartproblemwithsimulateduserinteractions
AT daanvanderweijden adaptivepoliticalsurveysandgpt4tacklingthecoldstartproblemwithsimulateduserinteractions
AT lucienheitz adaptivepoliticalsurveysandgpt4tacklingthecoldstartproblemwithsimulateduserinteractions
AT cristinasarasua adaptivepoliticalsurveysandgpt4tacklingthecoldstartproblemwithsimulateduserinteractions
AT abrahambernstein adaptivepoliticalsurveysandgpt4tacklingthecoldstartproblemwithsimulateduserinteractions