Stemming and N-gram matching for term conflation in Turkish texts
One of the main problems involved in the use of free text for indexing and retrieval is the variation in word forms that is likely to be encountered. The most common type of variations are spelling errors, alternative spellings, multi-word concepts, transliteration, affixes and abbreviations. One wa...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Borås
1996-01-01
|
Series: | Information Research: An International Electronic Journal |
Subjects: | |
Online Access: | http://informationr.net/ir/2-2/paper13.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | One of the main problems involved in the use of free text for indexing and retrieval is the variation in word forms that is likely to be encountered. The most common type of variations are spelling errors, alternative spellings, multi-word concepts, transliteration, affixes and abbreviations. One way to alleviate this problem is to use a conflation algorithm, a computational procedure that is designed to bring together words that are semantically related, and to reduce them to a single form for retrieval purposes. In this paper, we discuss the use of conflation techniques for Turkish text databases. |
---|---|
ISSN: | 1368-1613 |