A multimodal dataset for automating language vitality and endangerment assessment in south-south Nigeria

Abstract In this paper, a multimodal dataset was collected between July 2023 and April 2024 through purposive sampling from a field survey of proper households (households with at least one parent and one child) in South-South Geopolitical Zone of Nigeria. The dataset includes 543 validated response...

Full description

Saved in:

Bibliographic Details
Main Authors:	Moses Ekpenyong, Imelda Udoh, Eno-Abasi Urua, Nse Udoh, Ebitare Obikudo, Ogbonna Anyanwu, Ahmadu Shehu, Esther Sylvanus, Richard Bassey, Unyime Saturday, Temitope Fakiyesi, Celestina-Predia Kekai, Ememobong Udoh, Stella Ansa, Emeka Ifesieh, Gladys Ikhimwin, Unyime Udoeyo, Emem Alexander, Emmanuel Okon, Mfon Ekpe, Benjamin Okon Nyong, Moses Darah, Akpobome Diffre-Odiete, Lucky Ejobee, William Aigbedo, Francis Imoudu, Chima Manda, Mee-eebari Kiine, Doris Ugwu, Aniefon Akpan
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	Scientific Data
Online Access:	https://doi.org/10.1038/s41597-025-05337-6
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract In this paper, a multimodal dataset was collected between July 2023 and April 2024 through purposive sampling from a field survey of proper households (households with at least one parent and one child) in South-South Geopolitical Zone of Nigeria. The dataset includes 543 validated responses captured in real-time using an online survey developed with Google Forms. The survey instrument synthesised attributes derived from the United Nations, Educational, Scientific and Cultural Organisation (UNESCO) 2003 Language Vitality and Endangerment (LVE) framework, to capture household-specific data from five households per Local Government Area (LGA). The dataset also includes audio recordings of 108 words selected from the Swadesh wordlist and a transcription of the gloss, and tone patterns of each word, for proper description of the language’s speech system. The multimodal dataset can support the analysis of LVE patterns, linguistic trends, and complex interactions affecting language sustainability. It is reusable in linguistic, cultural and social science research, providing a robust resource for examining language diversity and preservation.
ISSN:	2052-4463

A multimodal dataset for automating language vitality and endangerment assessment in south-south Nigeria

Similar Items