Tokenization efficiency of current foundational large language models for the Ukrainian language
Foundational large language models (LLMs) are deployed in multilingual environments across a range of general and narrow task domains. These models generate text token by token, making them slower and more computationally expensive for low-resource languages that are underrepresented in the tokenize...
Saved in:
| Main Authors: | Daniil Maksymenko, Oleksii Turuta |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-08-01
|
| Series: | Frontiers in Artificial Intelligence |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2025.1538165/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
An Analysis of the Training Data Impact for Domain-Adapted Tokenizer Performances—The Case of Serbian Legal Domain Adaptation
by: Miloš Bogdanović, et al.
Published: (2025-07-01) -
Mixtec–Spanish Parallel Text Dataset for Language Technology Development
by: Hermilo Santiago-Benito, et al.
Published: (2025-06-01) -
Tokenization and deep learning architectures in genomics: A comprehensive review
by: Conrad Testagrose, et al.
Published: (2025-01-01) -
A comprehensive dataset and neural network approach for named entity recognition in the Uzbek languageMendeley Data
by: Davlatyor Mengliev, et al.
Published: (2025-02-01) -
Toward Low-Resource Languages Machine Translation: A Language-Specific Fine-Tuning With LoRA for Specialized Large Language Models
by: Xiao Liang, et al.
Published: (2025-01-01)