LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations

This study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Suhyun Cho, Jaeyun Kim, Jang Hyun Kim
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	LLM survey research NLP NLU synthetic data
Online Access:	https://ieeexplore.ieee.org/document/10758652/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850263626560045056
author	Suhyun Cho Jaeyun Kim Jang Hyun Kim
author_facet	Suhyun Cho Jaeyun Kim Jang Hyun Kim
author_sort	Suhyun Cho
collection	DOAJ
description	This study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardized questions. However, surveys require repeated administration with new participants each time, which involves significant costs and time. With the recent advancements in LLMs, artificial intelligence (AI) has shown remarkable capabilities, often surpassing humans in tasks that require natural language understanding (NLU) and natural language generation (NLG). Despite this, research on whether AI can replicate human thought processes in tasks such as text interpretation or question-answering remains insufficient. This study proposes a Surveyed LLM, specialized for survey tasks, and a Doppelganger LLM that mimics human thought processes. It tests to what extent the Doppelganger model can replicate human judgment. Furthermore, it suggests the possibility of mimicking not only group distributions but also individual opinions.
format	Article
id	doaj-art-7a7b731ceaa34c1f85eb35e00635e38b
institution	OA Journals
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-7a7b731ceaa34c1f85eb35e00635e38b2025-08-20T01:54:55ZengIEEEIEEE Access2169-35362024-01-011217891717892710.1109/ACCESS.2024.350221910758652LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey SimulationsSuhyun Cho0https://orcid.org/0000-0002-9410-8017Jaeyun Kim1Jang Hyun Kim2https://orcid.org/0000-0001-7750-2664Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of KoreaAI Model Development, Dareesoft, Seongnam-si, Republic of KoreaDepartment of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of KoreaThis study explores whether large language models (LLMs) can learn a person’s opinions from their speech and act based on that knowledge. It also proposes the potential for utilizing such trained models in survey research. Traditional survey research collects information through standardized questions. However, surveys require repeated administration with new participants each time, which involves significant costs and time. With the recent advancements in LLMs, artificial intelligence (AI) has shown remarkable capabilities, often surpassing humans in tasks that require natural language understanding (NLU) and natural language generation (NLG). Despite this, research on whether AI can replicate human thought processes in tasks such as text interpretation or question-answering remains insufficient. This study proposes a Surveyed LLM, specialized for survey tasks, and a Doppelganger LLM that mimics human thought processes. It tests to what extent the Doppelganger model can replicate human judgment. Furthermore, it suggests the possibility of mimicking not only group distributions but also individual opinions.https://ieeexplore.ieee.org/document/10758652/LLMsurvey researchNLPNLUsynthetic data
spellingShingle	Suhyun Cho Jaeyun Kim Jang Hyun Kim LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations IEEE Access LLM survey research NLP NLU synthetic data
title	LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_full	LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_fullStr	LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_full_unstemmed	LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_short	LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations
title_sort	llm based doppelg x00e4 nger models leveraging synthetic data for human like responses in survey simulations
topic	LLM survey research NLP NLU synthetic data
url	https://ieeexplore.ieee.org/document/10758652/
work_keys_str_mv	AT suhyuncho llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations AT jaeyunkim llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations AT janghyunkim llmbaseddoppelgx00e4ngermodelsleveragingsyntheticdataforhumanlikeresponsesinsurveysimulations

LLM-Based Doppelg&#x00E4;nger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations

Similar Items

LLM-Based Doppelgänger Models: Leveraging Synthetic Data for Human-Like Responses in Survey Simulations