Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis
This study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, ou...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Cogitatio
2025-05-01
|
| Series: | Media and Communication |
| Subjects: | |
| Online Access: | https://www.cogitatiopress.com/mediaandcommunication/article/view/9677 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849253916229763072 |
|---|---|
| author | Keyeun Lee Jaehyuk Park Suh-hee Choi Changkeun Lee |
| author_facet | Keyeun Lee Jaehyuk Park Suh-hee Choi Changkeun Lee |
| author_sort | Keyeun Lee |
| collection | DOAJ |
| description | This study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, our research examines policy preferences on tangible social and economic topics, offering deeper insights for news media and data analysts. In two key applications, we first explore whether LLMs can predict public sentiment on emerging or rapidly evolving issues using existing survey data. We then assess how LLMs generate synthetic datasets resembling real-world survey distributions. Our findings reveal that while LLMs capture demographic and ideological traits with reasonable accuracy, they tend to overemphasize ideological orientation for politically charged topics—a bias that is more pronounced in fully synthetic data, raising concerns about perpetuating societal stereotypes. Despite these challenges, LLMs hold promise for enhancing data-driven journalism and policy research, particularly in polarized societies. We call for further study into how LLM-based predictions align with human responses in diverse sociopolitical settings, alongside improved tools and guidelines to mitigate embedded biases. |
| format | Article |
| id | doaj-art-684ca7d58e5542c5a5b6a8364fa8bd1a |
| institution | Kabale University |
| issn | 2183-2439 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Cogitatio |
| record_format | Article |
| series | Media and Communication |
| spelling | doaj-art-684ca7d58e5542c5a5b6a8364fa8bd1a2025-08-20T03:56:10ZengCogitatioMedia and Communication2183-24392025-05-0113010.17645/mac.96774172Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion AnalysisKeyeun Lee0Jaehyuk Park1Suh-hee Choi2Changkeun Lee3Department of Communication, Seoul National University, South KoreaKDI School of Public Policy and Management, South KoreaDepartment of Geography, Kyung Hee University, South KoreaKDI School of Public Policy and Management, South KoreaThis study investigates whether large language models (LLMs) can meaningfully extend or generate synthetic public opinion survey data on labor policy issues in South Korea. Unlike prior work conducted on people’s general sociocultural values or specific political topics such as voting intentions, our research examines policy preferences on tangible social and economic topics, offering deeper insights for news media and data analysts. In two key applications, we first explore whether LLMs can predict public sentiment on emerging or rapidly evolving issues using existing survey data. We then assess how LLMs generate synthetic datasets resembling real-world survey distributions. Our findings reveal that while LLMs capture demographic and ideological traits with reasonable accuracy, they tend to overemphasize ideological orientation for politically charged topics—a bias that is more pronounced in fully synthetic data, raising concerns about perpetuating societal stereotypes. Despite these challenges, LLMs hold promise for enhancing data-driven journalism and policy research, particularly in polarized societies. We call for further study into how LLM-based predictions align with human responses in diverse sociopolitical settings, alongside improved tools and guidelines to mitigate embedded biases.https://www.cogitatiopress.com/mediaandcommunication/article/view/9677ai-generated textchatgptlarge language modelsnews mediapolicy preferencespublic opinions |
| spellingShingle | Keyeun Lee Jaehyuk Park Suh-hee Choi Changkeun Lee Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis Media and Communication ai-generated text chatgpt large language models news media policy preferences public opinions |
| title | Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis |
| title_full | Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis |
| title_fullStr | Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis |
| title_full_unstemmed | Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis |
| title_short | Ideology and Policy Preferences in Synthetic Data: The Potential of LLMs for Public Opinion Analysis |
| title_sort | ideology and policy preferences in synthetic data the potential of llms for public opinion analysis |
| topic | ai-generated text chatgpt large language models news media policy preferences public opinions |
| url | https://www.cogitatiopress.com/mediaandcommunication/article/view/9677 |
| work_keys_str_mv | AT keyeunlee ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis AT jaehyukpark ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis AT suhheechoi ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis AT changkeunlee ideologyandpolicypreferencesinsyntheticdatathepotentialofllmsforpublicopinionanalysis |