From digital traces to public vaccination behaviors: leveraging large language models for big data classification
IntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study e...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-07-01
|
| Series: | Frontiers in Artificial Intelligence |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849712637284188160 |
|---|---|
| author | Yoo Jung Oh Muhammad Ehab Rasul Emily McKinley Christopher Calabrese |
| author_facet | Yoo Jung Oh Muhammad Ehab Rasul Emily McKinley Christopher Calabrese |
| author_sort | Yoo Jung Oh |
| collection | DOAJ |
| description | IntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals’ vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.ResultsGPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (N = 36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; N = 267,930) included information sharing about the virus. Lastly, we found a strong correlation (r = 0.76, p < 0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.DiscussionThis study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed. |
| format | Article |
| id | doaj-art-7170e028bd984b16b7534b1be516de2c |
| institution | DOAJ |
| issn | 2624-8212 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Artificial Intelligence |
| spelling | doaj-art-7170e028bd984b16b7534b1be516de2c2025-08-20T03:14:12ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122025-07-01810.3389/frai.2025.16029841602984From digital traces to public vaccination behaviors: leveraging large language models for big data classificationYoo Jung Oh0Muhammad Ehab Rasul1Emily McKinley2Christopher Calabrese3Department of Communication, Michigan State University, East Lansing, MI, United StatesDepartment of Communication, University of California, Davis, Davis, CA, United StatesDepartment of Communication, University of California, Davis, Davis, CA, United StatesDepartment of Communication, Clemson University, Clemson, SC, United StatesIntroductionThe current study leverages large language models (LLMs) to capture health behaviors expressed in social media posts, focusing on COVID-19 vaccine-related content from 2020 to 2021.MethodsTo examine the capabilities of prompt engineering and fine-tuning approaches with LLMs, this study examines the performance of three state-of-the-art LLMs: GPT-4o, GPT-4o-mini, and GPT-4o-mini with fine-tuning, focusing on their ability to classify individuals’ vaccination behavior, intention to vaccinate, and information sharing. We then cross-validate these classifications with nationwide vaccination statistics to assess alignment with observed trends.ResultsGPT-4o-mini with fine-tuning outperformed both GPT-4o and the standard GPT-4o-mini in terms of accuracy, precision, recall, and F1 score. Using GPT-4o-mini with fine-tuning for classification, about 9.84% of the posts (N = 36,912) included personal behavior related to getting the COVID-19 vaccine while a majority of posts (71.45%; N = 267,930) included information sharing about the virus. Lastly, we found a strong correlation (r = 0.76, p < 0.01) between vaccination behaviors expressed on social media and the actual vaccine uptake over time.DiscussionThis study suggests that LLMs can serve as powerful tools for estimating real-world behaviors. Methodological and practical implications of utilizing LLMs in human behavior research are further discussed.https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/fullartificial intelligencelarge language modelsLLMSsocial mediaCOVID-19vaccination |
| spellingShingle | Yoo Jung Oh Muhammad Ehab Rasul Emily McKinley Christopher Calabrese From digital traces to public vaccination behaviors: leveraging large language models for big data classification Frontiers in Artificial Intelligence artificial intelligence large language models LLMS social media COVID-19 vaccination |
| title | From digital traces to public vaccination behaviors: leveraging large language models for big data classification |
| title_full | From digital traces to public vaccination behaviors: leveraging large language models for big data classification |
| title_fullStr | From digital traces to public vaccination behaviors: leveraging large language models for big data classification |
| title_full_unstemmed | From digital traces to public vaccination behaviors: leveraging large language models for big data classification |
| title_short | From digital traces to public vaccination behaviors: leveraging large language models for big data classification |
| title_sort | from digital traces to public vaccination behaviors leveraging large language models for big data classification |
| topic | artificial intelligence large language models LLMS social media COVID-19 vaccination |
| url | https://www.frontiersin.org/articles/10.3389/frai.2025.1602984/full |
| work_keys_str_mv | AT yoojungoh fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification AT muhammadehabrasul fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification AT emilymckinley fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification AT christophercalabrese fromdigitaltracestopublicvaccinationbehaviorsleveraginglargelanguagemodelsforbigdataclassification |