A large-scale dataset of AI-related tweets: Structure and descriptive statisticsGitHubDataverse

This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword “Artificial Inte...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nathalie de Marcellis-Warin, Daniel Kouloukoui, Thierry Warin
Format:	Article
Language:	English
Published:	Elsevier 2025-10-01
Series:	Data in Brief
Subjects:	Artificial intelligence Social media analysis Twitter data Natural language processing Public perception AI Ethics
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352340925006845
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword “Artificial Intelligence” and subsequently filtered to ensure data quality, multilingual coverage, and public accessibility. The final dataset includes structured metadata such as media elements (images, videos, and URLs), user engagement metrics (likes, retweets, replies), hashtags, language codes, and temporal indicators (hour and weekday of posting). While additional linguistic features—such as text length and tokenization—were used in internal analyses, they are not included in the public release. This dataset offers a robust foundation for research on the evolution of public discourse surrounding AI, including sentiment analysis, topic modeling, social engagement dynamics, and policy-relevant evaluations. It is openly available through established repositories and adheres to the FAIR principles, facilitating transparency, reproducibility, and interdisciplinary applications in computational social science, natural language processing, and AI governance research.
ISSN:	2352-3409

A large-scale dataset of AI-related tweets: Structure and descriptive statisticsGitHubDataverse

Similar Items