Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis

This study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, ou...

Full description

Saved in:
Bibliographic Details
Main Authors: Keyeun Lee, Jaehyuk Park, Suh-hee Choi, Changkeun Lee
Format: Article
Language:English
Published: Cogitatio 2025-05-01
Series:Media and Communication
Subjects:
Online Access:https://www.cogitatiopress.com/mediaandcommunication/article/view/9677
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849253916229763072
author Keyeun Lee
Jaehyuk Park
Suh-hee Choi
Changkeun Lee
author_facet Keyeun Lee
Jaehyuk Park
Suh-hee Choi
Changkeun Lee
author_sort Keyeun Lee
collection DOAJ
description This study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, our research examines policy preferences on tangible social and economic topics, offering deeper insights for news media and data analysts. In two key applications, we first explore whether LLMs can predict public sentiment on emerging or rapidly evolving issues using existing survey data. We then assess how LLMs generate synthetic datasets resembling real-world survey distributions. Our findings reveal that while LLMs capture demographic and ideological traits with reasonable accuracy, they tend to overemphasize ideological orientation for politically charged topics—a bias that is more pronounced in fully synthetic data, raising concerns about perpetuating societal stereotypes. Despite these challenges, LLMs hold promise for enhancing data-driven journalism and policy research, particularly in polarized societies. We call for further study into how LLM-based predictions align with human responses in diverse sociopolitical settings, alongside improved tools and guidelines to mitigate embedded biases.
format Article
id doaj-art-684ca7d58e5542c5a5b6a8364fa8bd1a
institution Kabale University
issn 2183-2439
language English
publishDate 2025-05-01
publisher Cogitatio
record_format Article
series Media and Communication
spelling doaj-art-684ca7d58e5542c5a5b6a8364fa8bd1a2025-08-20T03:56:10ZengCogitatioMedia and Communication2183-24392025-05-0113010.17645/mac.96774172Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion AnalysisKeyeun Lee0Jaehyuk Park1Suh-hee Choi2Changkeun Lee3Department of Communication, Seoul National University, South KoreaKDI School of Public Policy and Management, South KoreaDepartment of Geography, Kyung Hee University, South KoreaKDI School of Public Policy and Management, South KoreaThis study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, our research examines policy preferences on tangible social and economic topics, offering deeper insights for news media and data analysts. In two key applications, we first explore whether LLMs can predict public sentiment on emerging or rapidly evolving issues using existing survey data. We then assess how LLMs generate synthetic datasets resembling real-world survey distributions. Our findings reveal that while LLMs capture demographic and ideological traits with reasonable accuracy, they tend to overemphasize ideological orientation for politically charged topics—a bias that is more pronounced in fully synthetic data, raising concerns about perpetuating societal stereotypes. Despite these challenges, LLMs hold promise for enhancing data-driven journalism and policy research, particularly in polarized societies. We call for further study into how LLM-based predictions align with human responses in diverse sociopolitical settings, alongside improved tools and guidelines to mitigate embedded biases.https://www.cogitatiopress.com/mediaandcommunication/article/view/9677ai-generated textchatgptlarge language modelsnews mediapolicy preferencespublic opinions
spellingShingle Keyeun Lee
Jaehyuk Park
Suh-hee Choi
Changkeun Lee
Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
Media and Communication
ai-generated text
chatgpt
large language models
news media
policy preferences
public opinions
title Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
title_full Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
title_fullStr Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
title_full_unstemmed Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
title_short Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
title_sort ideology and policy preferences in synthetic data the potential of llms for public opinion analysis
topic ai-generated text
chatgpt
large language models
news media
policy preferences
public opinions
url https://www.cogitatiopress.com/mediaandcommunication/article/view/9677
work_keys_str_mv AT keyeunlee ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis
AT jaehyukpark ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis
AT suhheechoi ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis
AT changkeunlee ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis