Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data

Road traffic crashes are among the leading causes of death globally, resulting in substantial social and economic impacts. Online media is a key source of public information on road safety. Understanding how crashes are reported is crucial for detecting potential reporting biases and enhancing safet...

Full description

Saved in:
Bibliographic Details
Main Authors: Ashutosh Ashutosh, Sai Chand
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925003105
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850136920135303168
author Ashutosh Ashutosh
Sai Chand
author_facet Ashutosh Ashutosh
Sai Chand
author_sort Ashutosh Ashutosh
collection DOAJ
description Road traffic crashes are among the leading causes of death globally, resulting in substantial social and economic impacts. Online media is a key source of public information on road safety. Understanding how crashes are reported is crucial for detecting potential reporting biases and enhancing safety awareness. Hence, to address the issue of the lack of high-quality, media-reported fatal crash data, fatal crash reports were extracted for 2022–2023 from The Times of India, a prominent Indian news outlet. The resulting dataset comprised 2898 fatal crashes, 6584 fatalities and 7812 injuries, including 16 detailed crash attributes. This dataset was developed using web scraping and natural language processing (NLP) techniques. Automated tools such as Selenium and BeautifulSoup were employed to extract raw data from the news source. NLP algorithms were then applied to identify key crash attributes, including crash date, location, vehicles involved and number of fatalities. This study provides a replicable framework for constructing robust datasets from media sources, enabling multidisciplinary research on transportation safety, media reporting and public perception of crashes. The dataset is expected to serve as a valuable resource for analysing how the media shapes road safety narratives and for investigations on identifying high-fatality crash locations.
format Article
id doaj-art-8c929994ef3f46a6a9892d4700a0f3f2
institution OA Journals
issn 2352-3409
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-8c929994ef3f46a6a9892d4700a0f3f22025-08-20T02:31:00ZengElsevierData in Brief2352-34092025-06-016011157810.1016/j.dib.2025.111578Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley DataAshutosh Ashutosh0Sai Chand1Corresponding author.; Transportation Research and Injury Prevention Centre (TRIP Centre), Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaTransportation Research and Injury Prevention Centre (TRIP Centre), Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, IndiaRoad traffic crashes are among the leading causes of death globally, resulting in substantial social and economic impacts. Online media is a key source of public information on road safety. Understanding how crashes are reported is crucial for detecting potential reporting biases and enhancing safety awareness. Hence, to address the issue of the lack of high-quality, media-reported fatal crash data, fatal crash reports were extracted for 2022–2023 from The Times of India, a prominent Indian news outlet. The resulting dataset comprised 2898 fatal crashes, 6584 fatalities and 7812 injuries, including 16 detailed crash attributes. This dataset was developed using web scraping and natural language processing (NLP) techniques. Automated tools such as Selenium and BeautifulSoup were employed to extract raw data from the news source. NLP algorithms were then applied to identify key crash attributes, including crash date, location, vehicles involved and number of fatalities. This study provides a replicable framework for constructing robust datasets from media sources, enabling multidisciplinary research on transportation safety, media reporting and public perception of crashes. The dataset is expected to serve as a valuable resource for analysing how the media shapes road safety narratives and for investigations on identifying high-fatality crash locations.http://www.sciencedirect.com/science/article/pii/S2352340925003105Road safetyTraffic crash fatalitiesNews reportingCrash data
spellingShingle Ashutosh Ashutosh
Sai Chand
Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
Data in Brief
Road safety
Traffic crash fatalities
News reporting
Crash data
title Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
title_full Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
title_fullStr Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
title_full_unstemmed Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
title_short Dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in IndiaMendeley Data
title_sort dataset on fatal road traffic crash attributes extracted via natural language processing of online media articles in indiamendeley data
topic Road safety
Traffic crash fatalities
News reporting
Crash data
url http://www.sciencedirect.com/science/article/pii/S2352340925003105
work_keys_str_mv AT ashutoshashutosh datasetonfatalroadtrafficcrashattributesextractedvianaturallanguageprocessingofonlinemediaarticlesinindiamendeleydata
AT saichand datasetonfatalroadtrafficcrashattributesextractedvianaturallanguageprocessingofonlinemediaarticlesinindiamendeleydata