A Dataset on Linguistic Connectivity Across and Within Countries

Abstract We construct a new global dataset on common language. The data cover 242 countries and territories and are based on information about the speakers of 6,675 languages. Using data from Ethnologue, we provide 11 bilateral measures reflecting different dimensions of linguistic connections withi...

Full description

Saved in:
Bibliographic Details
Main Authors: Tamara Gurevich, Peter R. Herman, Farid Toubal, Yoto V. Yotov
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04692-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850217132963397632
author Tamara Gurevich
Peter R. Herman
Farid Toubal
Yoto V. Yotov
author_facet Tamara Gurevich
Peter R. Herman
Farid Toubal
Yoto V. Yotov
author_sort Tamara Gurevich
collection DOAJ
description Abstract We construct a new global dataset on common language. The data cover 242 countries and territories and are based on information about the speakers of 6,675 languages. Using data from Ethnologue, we provide 11 bilateral measures reflecting different dimensions of linguistic connections within and between countries, including common official languages, common native and acquired languages, and linguistic proximity across different languages. A key novelty of the dataset is that it includes consistently defined information on linguistic relationships not only between different countries but within the administrative borders of countries as well.
format Article
id doaj-art-4deabe9b7c3c4c4e8b8c80b88de96d4c
institution OA Journals
issn 2052-4463
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-4deabe9b7c3c4c4e8b8c80b88de96d4c2025-08-20T02:08:08ZengNature PortfolioScientific Data2052-44632025-03-0112111010.1038/s41597-025-04692-8A Dataset on Linguistic Connectivity Across and Within CountriesTamara Gurevich0Peter R. Herman1Farid Toubal2Yoto V. Yotov3U.S. International Trade Commission – Office of EconomicsU.S. International Trade Commission – Office of EconomicsUniversity of Paris Dauphine – PSL, CEPII, CESIfo and CEPRDrexel University – School of Economics, ifo Institute and CESifoAbstract We construct a new global dataset on common language. The data cover 242 countries and territories and are based on information about the speakers of 6,675 languages. Using data from Ethnologue, we provide 11 bilateral measures reflecting different dimensions of linguistic connections within and between countries, including common official languages, common native and acquired languages, and linguistic proximity across different languages. A key novelty of the dataset is that it includes consistently defined information on linguistic relationships not only between different countries but within the administrative borders of countries as well.https://doi.org/10.1038/s41597-025-04692-8
spellingShingle Tamara Gurevich
Peter R. Herman
Farid Toubal
Yoto V. Yotov
A Dataset on Linguistic Connectivity Across and Within Countries
Scientific Data
title A Dataset on Linguistic Connectivity Across and Within Countries
title_full A Dataset on Linguistic Connectivity Across and Within Countries
title_fullStr A Dataset on Linguistic Connectivity Across and Within Countries
title_full_unstemmed A Dataset on Linguistic Connectivity Across and Within Countries
title_short A Dataset on Linguistic Connectivity Across and Within Countries
title_sort dataset on linguistic connectivity across and within countries
url https://doi.org/10.1038/s41597-025-04692-8
work_keys_str_mv AT tamaragurevich adatasetonlinguisticconnectivityacrossandwithincountries
AT peterrherman adatasetonlinguisticconnectivityacrossandwithincountries
AT faridtoubal adatasetonlinguisticconnectivityacrossandwithincountries
AT yotovyotov adatasetonlinguisticconnectivityacrossandwithincountries
AT tamaragurevich datasetonlinguisticconnectivityacrossandwithincountries
AT peterrherman datasetonlinguisticconnectivityacrossandwithincountries
AT faridtoubal datasetonlinguisticconnectivityacrossandwithincountries
AT yotovyotov datasetonlinguisticconnectivityacrossandwithincountries