A multimodal dataset for automating language vitality and endangerment assessment in south-south Nigeria
Abstract In this paper, a multimodal dataset was collected between July 2023 and April 2024 through purposive sampling from a field survey of proper households (households with at least one parent and one child) in South-South Geopolitical Zone of Nigeria. The dataset includes 543 validated response...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05337-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In this paper, a multimodal dataset was collected between July 2023 and April 2024 through purposive sampling from a field survey of proper households (households with at least one parent and one child) in South-South Geopolitical Zone of Nigeria. The dataset includes 543 validated responses captured in real-time using an online survey developed with Google Forms. The survey instrument synthesised attributes derived from the United Nations, Educational, Scientific and Cultural Organisation (UNESCO) 2003 Language Vitality and Endangerment (LVE) framework, to capture household-specific data from five households per Local Government Area (LGA). The dataset also includes audio recordings of 108 words selected from the Swadesh wordlist and a transcription of the gloss, and tone patterns of each word, for proper description of the language’s speech system. The multimodal dataset can support the analysis of LVE patterns, linguistic trends, and complex interactions affecting language sustainability. It is reusable in linguistic, cultural and social science research, providing a robust resource for examining language diversity and preservation. |
|---|---|
| ISSN: | 2052-4463 |