Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances

In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another...

Full description

Saved in:
Bibliographic Details
Main Authors: Chaoju Tang, Vincent J. van Heuven, Wilbert Heeringa, Charlotte Gooskens
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Languages
Subjects:
Online Access:https://www.mdpi.com/2226-471X/10/6/127
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431726671003648
author Chaoju Tang
Vincent J. van Heuven
Wilbert Heeringa
Charlotte Gooskens
author_facet Chaoju Tang
Vincent J. van Heuven
Wilbert Heeringa
Charlotte Gooskens
author_sort Chaoju Tang
collection DOAJ
description In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another than some European varieties that are traditionally called languages. More generally, we examined whether distances among varieties within and across European language families were larger than those within and across Sinitic language varieties. To this end, we computed lexico-phonetic as well as syntactic distance measures for comparable language materials in six Germanic, five Romance and six Slavic languages, as well as for six Mandarin and nine non-Mandarin (‘southern’) Chinese varieties. Lexico-phonetic distances were expressed as the length-normalized MPI-weighted Levenshtein distances computed on the 100 most frequently used nouns in the 32 language varieties. Syntactic distance was implemented as the (complement of) the Pearson correlation coefficient found for the PoS trigram frequencies established for a parallel corpus of the same four texts translated into each of the 32 languages. The lexico-phonetic distances proved to be relatively large and of approximately equal magnitude in the Germanic, Slavic and non-Mandarin Chinese language varieties. However, the lexico-phonetic distances among the Romance and Mandarin languages were considerably smaller, but of similar magnitude. Cantonese (Guangzhou dialect) was lexico-phonetically as distant from Standard Mandarin (Beijing dialect) as European language pairs such as Portuguese–Italian, Portuguese–Romanian and Dutch–German. Syntactically, however, the differences among the Sinitic varieties were about ten times smaller than the differences among the European languages, both within and across the families—which provides some justification for the Chinese tradition of calling the Sinitic varieties dialects of the same language.
format Article
id doaj-art-ce272ee0e9834da8a82da6cd9693bc5e
institution Kabale University
issn 2226-471X
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Languages
spelling doaj-art-ce272ee0e9834da8a82da6cd9693bc5e2025-08-20T03:27:33ZengMDPI AGLanguages2226-471X2025-05-0110612710.3390/languages10060127Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic DistancesChaoju Tang0Vincent J. van Heuven1Wilbert Heeringa2Charlotte Gooskens3School of Foreign Languages, University of Electronic Science and Technology, Chengdu 611731, ChinaLeiden University Centre for Linguistics, 9500 RA Leiden, The NetherlandsFryske Akademy, 8911 DX Leeuwarden, The NetherlandsCenter for Languages and Cognition, University of Groningen, 9700 AS Groningen, The NetherlandsIn this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another than some European varieties that are traditionally called languages. More generally, we examined whether distances among varieties within and across European language families were larger than those within and across Sinitic language varieties. To this end, we computed lexico-phonetic as well as syntactic distance measures for comparable language materials in six Germanic, five Romance and six Slavic languages, as well as for six Mandarin and nine non-Mandarin (‘southern’) Chinese varieties. Lexico-phonetic distances were expressed as the length-normalized MPI-weighted Levenshtein distances computed on the 100 most frequently used nouns in the 32 language varieties. Syntactic distance was implemented as the (complement of) the Pearson correlation coefficient found for the PoS trigram frequencies established for a parallel corpus of the same four texts translated into each of the 32 languages. The lexico-phonetic distances proved to be relatively large and of approximately equal magnitude in the Germanic, Slavic and non-Mandarin Chinese language varieties. However, the lexico-phonetic distances among the Romance and Mandarin languages were considerably smaller, but of similar magnitude. Cantonese (Guangzhou dialect) was lexico-phonetically as distant from Standard Mandarin (Beijing dialect) as European language pairs such as Portuguese–Italian, Portuguese–Romanian and Dutch–German. Syntactically, however, the differences among the Sinitic varieties were about ten times smaller than the differences among the European languages, both within and across the families—which provides some justification for the Chinese tradition of calling the Sinitic varieties dialects of the same language.https://www.mdpi.com/2226-471X/10/6/127affinity treesChinese dialectsEuropean languagesLevenshtein distancelexico-phonetic distancemulti-dimensional scaling (MDS)
spellingShingle Chaoju Tang
Vincent J. van Heuven
Wilbert Heeringa
Charlotte Gooskens
Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
Languages
affinity trees
Chinese dialects
European languages
Levenshtein distance
lexico-phonetic distance
multi-dimensional scaling (MDS)
title Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
title_full Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
title_fullStr Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
title_full_unstemmed Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
title_short Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
title_sort chinese dialects and european languages a comparison of lexico phonetic and syntactic distances
topic affinity trees
Chinese dialects
European languages
Levenshtein distance
lexico-phonetic distance
multi-dimensional scaling (MDS)
url https://www.mdpi.com/2226-471X/10/6/127
work_keys_str_mv AT chaojutang chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances
AT vincentjvanheuven chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances
AT wilbertheeringa chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances
AT charlottegooskens chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances