Cognate Production Using Character-Based Neural Machine Translation Without Segmentation

Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model...

Full description

Saved in:
Bibliographic Details
Main Authors: Tsolmon Zundui, Khuyagbaatar Batsuren, Tsendsuren Munkhdalai, Amarsanaa Ganbold
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10892102/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850205158673219584
author Tsolmon Zundui
Khuyagbaatar Batsuren
Tsendsuren Munkhdalai
Amarsanaa Ganbold
author_facet Tsolmon Zundui
Khuyagbaatar Batsuren
Tsendsuren Munkhdalai
Amarsanaa Ganbold
author_sort Tsolmon Zundui
collection DOAJ
description Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations.
format Article
id doaj-art-db895f0bf9664a50b0ea851687c86aec
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-db895f0bf9664a50b0ea851687c86aec2025-08-20T02:11:09ZengIEEEIEEE Access2169-35362025-01-0113348243483010.1109/ACCESS.2025.354365210892102Cognate Production Using Character-Based Neural Machine Translation Without SegmentationTsolmon Zundui0https://orcid.org/0000-0002-2797-517XKhuyagbaatar Batsuren1Tsendsuren Munkhdalai2Amarsanaa Ganbold3https://orcid.org/0000-0003-4335-6608National University of Mongolia, Ulaanbaatar, MongoliaNational University of Mongolia, Ulaanbaatar, MongoliaGoogle Research, Mountain View, CA, USANational University of Mongolia, Ulaanbaatar, MongoliaCognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations.https://ieeexplore.ieee.org/document/10892102/Cognate productionsequence-to-sequence modelrecurrent neural network
spellingShingle Tsolmon Zundui
Khuyagbaatar Batsuren
Tsendsuren Munkhdalai
Amarsanaa Ganbold
Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
IEEE Access
Cognate production
sequence-to-sequence model
recurrent neural network
title Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
title_full Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
title_fullStr Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
title_full_unstemmed Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
title_short Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
title_sort cognate production using character based neural machine translation without segmentation
topic Cognate production
sequence-to-sequence model
recurrent neural network
url https://ieeexplore.ieee.org/document/10892102/
work_keys_str_mv AT tsolmonzundui cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation
AT khuyagbaatarbatsuren cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation
AT tsendsurenmunkhdalai cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation
AT amarsanaaganbold cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation