Comparison of Language Models for English-Latvian Semantic Search
In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latv...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2025-01-01
|
Series: | Applied Computer Systems |
Subjects: | |
Online Access: | https://doi.org/10.2478/acss-2025-0004 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles from Latvian and English Wikipedia was created, transformed into embedding vectors, and used for retrieval experiments with brute force search, Hierarchical Navigable Small World method, and Inverted File Indexing method. The LaBSE language model achieved the best performance for short texts and a version of Sentence-BERT and E5-large for long texts. |
---|---|
ISSN: | 2255-8691 |