Comparison of Language Models for English-Latvian Semantic Search
In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latv...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2025-01-01
|
Series: | Applied Computer Systems |
Subjects: | |
Online Access: | https://doi.org/10.2478/acss-2025-0004 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823860507132559360 |
---|---|
author | Kucheravy Artem Jēkabsons Gints |
author_facet | Kucheravy Artem Jēkabsons Gints |
author_sort | Kucheravy Artem |
collection | DOAJ |
description | In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles from Latvian and English Wikipedia was created, transformed into embedding vectors, and used for retrieval experiments with brute force search, Hierarchical Navigable Small World method, and Inverted File Indexing method. The LaBSE language model achieved the best performance for short texts and a version of Sentence-BERT and E5-large for long texts. |
format | Article |
id | doaj-art-880887befac5438fa611525e451122c7 |
institution | Kabale University |
issn | 2255-8691 |
language | English |
publishDate | 2025-01-01 |
publisher | Sciendo |
record_format | Article |
series | Applied Computer Systems |
spelling | doaj-art-880887befac5438fa611525e451122c72025-02-10T13:25:18ZengSciendoApplied Computer Systems2255-86912025-01-01301343910.2478/acss-2025-0004Comparison of Language Models for English-Latvian Semantic SearchKucheravy Artem0Jēkabsons Gints1Institute of Applied Computer Systems, Riga Technical University, Riga, LatviaInstitute of Applied Computer Systems, Riga Technical University, Riga, LatviaIn this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles from Latvian and English Wikipedia was created, transformed into embedding vectors, and used for retrieval experiments with brute force search, Hierarchical Navigable Small World method, and Inverted File Indexing method. The LaBSE language model achieved the best performance for short texts and a version of Sentence-BERT and E5-large for long texts.https://doi.org/10.2478/acss-2025-0004embeddingslanguage modelssemantic searchsentence-transformers |
spellingShingle | Kucheravy Artem Jēkabsons Gints Comparison of Language Models for English-Latvian Semantic Search Applied Computer Systems embeddings language models semantic search sentence-transformers |
title | Comparison of Language Models for English-Latvian Semantic Search |
title_full | Comparison of Language Models for English-Latvian Semantic Search |
title_fullStr | Comparison of Language Models for English-Latvian Semantic Search |
title_full_unstemmed | Comparison of Language Models for English-Latvian Semantic Search |
title_short | Comparison of Language Models for English-Latvian Semantic Search |
title_sort | comparison of language models for english latvian semantic search |
topic | embeddings language models semantic search sentence-transformers |
url | https://doi.org/10.2478/acss-2025-0004 |
work_keys_str_mv | AT kucheravyartem comparisonoflanguagemodelsforenglishlatviansemanticsearch AT jekabsonsgints comparisonoflanguagemodelsforenglishlatviansemanticsearch |