LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations

This study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardi...

Full description

Saved in:
Bibliographic Details
Main Authors: Suhyun Cho, Jaeyun Kim, Jang Hyun Kim
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10758652/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850263626560045056
author Suhyun Cho
Jaeyun Kim
Jang Hyun Kim
author_facet Suhyun Cho
Jaeyun Kim
Jang Hyun Kim
author_sort Suhyun Cho
collection DOAJ
description This study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardized questions. However, surveys require repeated administration with new participants each time, which involves significant costs and time. With the recent advancements in LLMs, artificial intelligence (AI) has shown remarkable capabilities, often surpassing humans in tasks that require natural language understanding (NLU) and natural language generation (NLG). Despite this, research on whether AI can replicate human thought processes in tasks such as text interpretation or question-answering remains insufficient. This study proposes a Surveyed LLM, specialized for survey tasks, and a Doppelganger LLM that mimics human thought processes. It tests to what extent the Doppelganger model can replicate human judgment. Furthermore, it suggests the possibility of mimicking not only group distributions but also individual opinions.
format Article
id doaj-art-7a7b731ceaa34c1f85eb35e00635e38b
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-7a7b731ceaa34c1f85eb35e00635e38b2025-08-20T01:54:55ZengIEEEIEEE Access2169-35362024-01-011217891717892710.1109/ACCESS.2024.350221910758652LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey SimulationsSuhyun Cho0https://orcid.org/0000-0002-9410-8017Jaeyun Kim1Jang Hyun Kim2https://orcid.org/0000-0001-7750-2664Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of KoreaAI Model Development, Dareesoft, Seongnam-si, Republic of KoreaDepartment of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of KoreaThis study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardized questions. However, surveys require repeated administration with new participants each time, which involves significant costs and time. With the recent advancements in LLMs, artificial intelligence (AI) has shown remarkable capabilities, often surpassing humans in tasks that require natural language understanding (NLU) and natural language generation (NLG). Despite this, research on whether AI can replicate human thought processes in tasks such as text interpretation or question-answering remains insufficient. This study proposes a Surveyed LLM, specialized for survey tasks, and a Doppelganger LLM that mimics human thought processes. It tests to what extent the Doppelganger model can replicate human judgment. Furthermore, it suggests the possibility of mimicking not only group distributions but also individual opinions.https://ieeexplore.ieee.org/document/10758652/LLMsurvey researchNLPNLUsynthetic data
spellingShingle Suhyun Cho
Jaeyun Kim
Jang Hyun Kim
LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
IEEE Access
LLM
survey research
NLP
NLU
synthetic data
title LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_full LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_fullStr LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_full_unstemmed LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_short LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_sort llm based doppelg x00e4 nger models leveraging synthetic data for human like responses in survey simulations
topic LLM
survey research
NLP
NLU
synthetic data
url https://ieeexplore.ieee.org/document/10758652/
work_keys_str_mv AT suhyuncho llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations
AT jaeyunkim llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations
AT janghyunkim llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations