Entry Generation by Analogy – Encoding New Words for Morphological Lexicons

Language software applications encounter new words, e.g., acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexic...

Full description

Saved in:
Bibliographic Details
Main Author: Krister Lindén
Format: Article
Language:English
Published: Linköping University Electronic Press 2009-05-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/1649
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590590687051776
author Krister Lindén
author_facet Krister Lindén
author_sort Krister Lindén
collection DOAJ
description Language software applications encounter new words, e.g., acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer lexicons. The methods have been implemented with the open-source Helsinki Finite-State Technology (Lindén & al., 2009). As an entry generator often produces numerous suggestions, it is important that the best suggestions be among the first few, otherwise it may become more efficient to create the entries by hand. By combining the probabilities calculated from corpus data and from lexical data, we get a more precise combined model. The combined method has 77-81 % precision and 89-97 % recall, i.e. the first correctly generated entry is on the average found as the first or second candidate for the test languages. A further study demonstrated that a native speaker could revise suggestions from the entry generator at a speed of 300-400 entries per hour.
format Article
id doaj-art-2aadedf5af0a4df097137fd6904317f7
institution Kabale University
issn 2000-1533
language English
publishDate 2009-05-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-2aadedf5af0a4df097137fd6904317f72025-01-23T10:36:35ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332009-05-01110.3384/nejlt.2000-1533.09111Entry Generation by Analogy – Encoding New Words for Morphological LexiconsKrister Lindén0Department of General Linguistics, University of Helsinki, Finland Language software applications encounter new words, e.g., acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer lexicons. The methods have been implemented with the open-source Helsinki Finite-State Technology (Lindén & al., 2009). As an entry generator often produces numerous suggestions, it is important that the best suggestions be among the first few, otherwise it may become more efficient to create the entries by hand. By combining the probabilities calculated from corpus data and from lexical data, we get a more precise combined model. The combined method has 77-81 % precision and 89-97 % recall, i.e. the first correctly generated entry is on the average found as the first or second candidate for the test languages. A further study demonstrated that a native speaker could revise suggestions from the entry generator at a speed of 300-400 entries per hour. https://nejlt.ep.liu.se/article/view/1649
spellingShingle Krister Lindén
Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
Northern European Journal of Language Technology
title Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
title_full Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
title_fullStr Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
title_full_unstemmed Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
title_short Entry Generation by Analogy – Encoding New Words for Morphological Lexicons
title_sort entry generation by analogy encoding new words for morphological lexicons
url https://nejlt.ep.liu.se/article/view/1649
work_keys_str_mv AT kristerlinden entrygenerationbyanalogyencodingnewwordsformorphologicallexicons