From digital traces to public vaccination behaviors: leveraging large language models for big data classification

IntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study e...

Full description

Saved in:
Bibliographic Details
Main Authors: Yoo Jung Oh, Muhammad Ehab Rasul, Emily McKinley, Christopher Calabrese
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-07-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849712637284188160
author Yoo Jung Oh
Muhammad Ehab Rasul
Emily McKinley
Christopher Calabrese
author_facet Yoo Jung Oh
Muhammad Ehab Rasul
Emily McKinley
Christopher Calabrese
author_sort Yoo Jung Oh
collection DOAJ
description IntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals’ vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.ResultsGPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (N = 36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; N = 267,930) included information sharing about the virus. Lastly, we found a strong correlation (r = 0.76, p < 0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.DiscussionThis study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed.
format Article
id doaj-art-7170e028bd984b16b7534b1be516de2c
institution DOAJ
issn 2624-8212
language English
publishDate 2025-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Artificial Intelligence
spelling doaj-art-7170e028bd984b16b7534b1be516de2c2025-08-20T03:14:12ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-07-01810.3389/frai.2025.16029841602984From digital traces to public vaccination behaviors: leveraging large language models for big data classificationYoo Jung Oh0Muhammad Ehab Rasul1Emily McKinley2Christopher Calabrese3Department of Communication, Michigan State University, East Lansing, MI, United StatesDepartment of Communication, University of California, Davis, Davis, CA, United StatesDepartment of Communication, University of California, Davis, Davis, CA, United StatesDepartment of Communication, Clemson University, Clemson, SC, United StatesIntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals’ vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.ResultsGPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (N = 36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; N = 267,930) included information sharing about the virus. Lastly, we found a strong correlation (r = 0.76, p < 0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.DiscussionThis study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed.https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/fullartificial intelligencelarge language modelsLLMSsocial mediaCOVID-19vaccination
spellingShingle Yoo Jung Oh
Muhammad Ehab Rasul
Emily McKinley
Christopher Calabrese
From digital traces to public vaccination behaviors: leveraging large language models for big data classification
Frontiers in Artificial Intelligence
artificial intelligence
large language models
LLMS
social media
COVID-19
vaccination
title From digital traces to public vaccination behaviors: leveraging large language models for big data classification
title_full From digital traces to public vaccination behaviors: leveraging large language models for big data classification
title_fullStr From digital traces to public vaccination behaviors: leveraging large language models for big data classification
title_full_unstemmed From digital traces to public vaccination behaviors: leveraging large language models for big data classification
title_short From digital traces to public vaccination behaviors: leveraging large language models for big data classification
title_sort from digital traces to public vaccination behaviors leveraging large language models for big data classification
topic artificial intelligence
large language models
LLMS
social media
COVID-19
vaccination
url https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/full
work_keys_str_mv AT yoojungoh fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification
AT muhammadehabrasul fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification
AT emilymckinley fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification
AT christophercalabrese fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification