Cognate Production Using Character-Based Neural Machine Translation Without Segmentation
Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10892102/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850205158673219584 |
|---|---|
| author | Tsolmon Zundui Khuyagbaatar Batsuren Tsendsuren Munkhdalai Amarsanaa Ganbold |
| author_facet | Tsolmon Zundui Khuyagbaatar Batsuren Tsendsuren Munkhdalai Amarsanaa Ganbold |
| author_sort | Tsolmon Zundui |
| collection | DOAJ |
| description | Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations. |
| format | Article |
| id | doaj-art-db895f0bf9664a50b0ea851687c86aec |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-db895f0bf9664a50b0ea851687c86aec2025-08-20T02:11:09ZengIEEEIEEE Access2169-35362025-01-0113348243483010.1109/ACCESS.2025.354365210892102Cognate Production Using Character-Based Neural Machine Translation Without SegmentationTsolmon Zundui0https://orcid.org/0000-0002-2797-517XKhuyagbaatar Batsuren1Tsendsuren Munkhdalai2Amarsanaa Ganbold3https://orcid.org/0000-0003-4335-6608National University of Mongolia, Ulaanbaatar, MongoliaNational University of Mongolia, Ulaanbaatar, MongoliaGoogle Research, Mountain View, CA, USANational University of Mongolia, Ulaanbaatar, MongoliaCognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations.https://ieeexplore.ieee.org/document/10892102/Cognate productionsequence-to-sequence modelrecurrent neural network |
| spellingShingle | Tsolmon Zundui Khuyagbaatar Batsuren Tsendsuren Munkhdalai Amarsanaa Ganbold Cognate Production Using Character-Based Neural Machine Translation Without Segmentation IEEE Access Cognate production sequence-to-sequence model recurrent neural network |
| title | Cognate Production Using Character-Based Neural Machine Translation Without Segmentation |
| title_full | Cognate Production Using Character-Based Neural Machine Translation Without Segmentation |
| title_fullStr | Cognate Production Using Character-Based Neural Machine Translation Without Segmentation |
| title_full_unstemmed | Cognate Production Using Character-Based Neural Machine Translation Without Segmentation |
| title_short | Cognate Production Using Character-Based Neural Machine Translation Without Segmentation |
| title_sort | cognate production using character based neural machine translation without segmentation |
| topic | Cognate production sequence-to-sequence model recurrent neural network |
| url | https://ieeexplore.ieee.org/document/10892102/ |
| work_keys_str_mv | AT tsolmonzundui cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation AT khuyagbaatarbatsuren cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation AT tsendsurenmunkhdalai cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation AT amarsanaaganbold cognateproductionusingcharacterbasedneuralmachinetranslationwithoutsegmentation |