Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances
In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Languages |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2226-471X/10/6/127 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849431726671003648 |
|---|---|
| author | Chaoju Tang Vincent J. van Heuven Wilbert Heeringa Charlotte Gooskens |
| author_facet | Chaoju Tang Vincent J. van Heuven Wilbert Heeringa Charlotte Gooskens |
| author_sort | Chaoju Tang |
| collection | DOAJ |
| description | In this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another than some European varieties that are traditionally called languages. More generally, we examined whether distances among varieties within and across European language families were larger than those within and across Sinitic language varieties. To this end, we computed lexico-phonetic as well as syntactic distance measures for comparable language materials in six Germanic, five Romance and six Slavic languages, as well as for six Mandarin and nine non-Mandarin (‘southern’) Chinese varieties. Lexico-phonetic distances were expressed as the length-normalized MPI-weighted Levenshtein distances computed on the 100 most frequently used nouns in the 32 language varieties. Syntactic distance was implemented as the (complement of) the Pearson correlation coefficient found for the PoS trigram frequencies established for a parallel corpus of the same four texts translated into each of the 32 languages. The lexico-phonetic distances proved to be relatively large and of approximately equal magnitude in the Germanic, Slavic and non-Mandarin Chinese language varieties. However, the lexico-phonetic distances among the Romance and Mandarin languages were considerably smaller, but of similar magnitude. Cantonese (Guangzhou dialect) was lexico-phonetically as distant from Standard Mandarin (Beijing dialect) as European language pairs such as Portuguese–Italian, Portuguese–Romanian and Dutch–German. Syntactically, however, the differences among the Sinitic varieties were about ten times smaller than the differences among the European languages, both within and across the families—which provides some justification for the Chinese tradition of calling the Sinitic varieties dialects of the same language. |
| format | Article |
| id | doaj-art-ce272ee0e9834da8a82da6cd9693bc5e |
| institution | Kabale University |
| issn | 2226-471X |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Languages |
| spelling | doaj-art-ce272ee0e9834da8a82da6cd9693bc5e2025-08-20T03:27:33ZengMDPI AGLanguages2226-471X2025-05-0110612710.3390/languages10060127Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic DistancesChaoju Tang0Vincent J. van Heuven1Wilbert Heeringa2Charlotte Gooskens3School of Foreign Languages, University of Electronic Science and Technology, Chengdu 611731, ChinaLeiden University Centre for Linguistics, 9500 RA Leiden, The NetherlandsFryske Akademy, 8911 DX Leeuwarden, The NetherlandsCenter for Languages and Cognition, University of Groningen, 9700 AS Groningen, The NetherlandsIn this article, we tested some specific claims made in the literature on relative distances among European languages and among Chinese dialects, suggesting that some language varieties within the Sinitic family traditionally called dialects are, in fact, more linguistically distant from one another than some European varieties that are traditionally called languages. More generally, we examined whether distances among varieties within and across European language families were larger than those within and across Sinitic language varieties. To this end, we computed lexico-phonetic as well as syntactic distance measures for comparable language materials in six Germanic, five Romance and six Slavic languages, as well as for six Mandarin and nine non-Mandarin (‘southern’) Chinese varieties. Lexico-phonetic distances were expressed as the length-normalized MPI-weighted Levenshtein distances computed on the 100 most frequently used nouns in the 32 language varieties. Syntactic distance was implemented as the (complement of) the Pearson correlation coefficient found for the PoS trigram frequencies established for a parallel corpus of the same four texts translated into each of the 32 languages. The lexico-phonetic distances proved to be relatively large and of approximately equal magnitude in the Germanic, Slavic and non-Mandarin Chinese language varieties. However, the lexico-phonetic distances among the Romance and Mandarin languages were considerably smaller, but of similar magnitude. Cantonese (Guangzhou dialect) was lexico-phonetically as distant from Standard Mandarin (Beijing dialect) as European language pairs such as Portuguese–Italian, Portuguese–Romanian and Dutch–German. Syntactically, however, the differences among the Sinitic varieties were about ten times smaller than the differences among the European languages, both within and across the families—which provides some justification for the Chinese tradition of calling the Sinitic varieties dialects of the same language.https://www.mdpi.com/2226-471X/10/6/127affinity treesChinese dialectsEuropean languagesLevenshtein distancelexico-phonetic distancemulti-dimensional scaling (MDS) |
| spellingShingle | Chaoju Tang Vincent J. van Heuven Wilbert Heeringa Charlotte Gooskens Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances Languages affinity trees Chinese dialects European languages Levenshtein distance lexico-phonetic distance multi-dimensional scaling (MDS) |
| title | Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances |
| title_full | Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances |
| title_fullStr | Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances |
| title_full_unstemmed | Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances |
| title_short | Chinese “Dialects” and European “Languages”: A Comparison of Lexico-Phonetic and Syntactic Distances |
| title_sort | chinese dialects and european languages a comparison of lexico phonetic and syntactic distances |
| topic | affinity trees Chinese dialects European languages Levenshtein distance lexico-phonetic distance multi-dimensional scaling (MDS) |
| url | https://www.mdpi.com/2226-471X/10/6/127 |
| work_keys_str_mv | AT chaojutang chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances AT vincentjvanheuven chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances AT wilbertheeringa chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances AT charlottegooskens chinesedialectsandeuropeanlanguagesacomparisonoflexicophoneticandsyntacticdistances |