A large-scale dataset of AI-related tweets: Structure and descriptive statisticsGitHubDataverse
This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword “Artificial Inte...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-10-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925006845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This article presents a curated and anonymized dataset of tweets related to artificial intelligence (AI), comprising 893,076 entries collected using the Twitter API between January 1, 2017, and July 19, 2021. These tweets were extracted from a larger initial corpus using the keyword “Artificial Intelligence” and subsequently filtered to ensure data quality, multilingual coverage, and public accessibility. The final dataset includes structured metadata such as media elements (images, videos, and URLs), user engagement metrics (likes, retweets, replies), hashtags, language codes, and temporal indicators (hour and weekday of posting). While additional linguistic features—such as text length and tokenization—were used in internal analyses, they are not included in the public release. This dataset offers a robust foundation for research on the evolution of public discourse surrounding AI, including sentiment analysis, topic modeling, social engagement dynamics, and policy-relevant evaluations. It is openly available through established repositories and adheres to the FAIR principles, facilitating transparency, reproducibility, and interdisciplinary applications in computational social science, natural language processing, and AI governance research. |
|---|---|
| ISSN: | 2352-3409 |