Processing morphological variants in searches of Latin text
A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Borås
1996-01-01
|
Series: | Information Research: An International Electronic Journal |
Subjects: | |
Online Access: | http://informationr.net/ir/2-1/paper10.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832569870903934976 |
---|---|
author | Mark Greengrass Alexander M. Robertson Robyn Schinke Peter Willett |
author_facet | Mark Greengrass Alexander M. Robertson Robyn Schinke Peter Willett |
author_sort | Mark Greengrass |
collection | DOAJ |
description | A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user-controlled right-hand (and occasionally left-hand) truncation to allow the retrieval of all words with the same root. A stemming algorithm, or stemmer, is a computational procedure that reduces all words with the same root to a single form by stripping the root of its derivational and inflectional affixes. In most cases, only suffixes are stripped so that a stemmer provides an automatic equivalent of manual, right-hand truncation. Thus far, most work on stemmers has focused on present-day languages, but the increasing user of computers in the humanities has resulted in a need for comparable tools to facilitate searching in historical text databases. This paper summarises some of the initial results of a project here in Sheffield to develop such tools for databases of Latin text. |
format | Article |
id | doaj-art-6c019c74aacb4a27865a221562b2fe01 |
institution | Kabale University |
issn | 1368-1613 |
language | English |
publishDate | 1996-01-01 |
publisher | University of Borås |
record_format | Article |
series | Information Research: An International Electronic Journal |
spelling | doaj-art-6c019c74aacb4a27865a221562b2fe012025-02-02T19:08:02ZengUniversity of BoråsInformation Research: An International Electronic Journal1368-16131996-01-012110Processing morphological variants in searches of Latin textMark GreengrassAlexander M. RobertsonRobyn SchinkePeter WillettA characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user-controlled right-hand (and occasionally left-hand) truncation to allow the retrieval of all words with the same root. A stemming algorithm, or stemmer, is a computational procedure that reduces all words with the same root to a single form by stripping the root of its derivational and inflectional affixes. In most cases, only suffixes are stripped so that a stemmer provides an automatic equivalent of manual, right-hand truncation. Thus far, most work on stemmers has focused on present-day languages, but the increasing user of computers in the humanities has resulted in a need for comparable tools to facilitate searching in historical text databases. This paper summarises some of the initial results of a project here in Sheffield to develop such tools for databases of Latin text.http://informationr.net/ir/2-1/paper10.htmlnatural languagetext databasesquery wordsrecallword variantsmorphologyretrieval systemstruncationinformation retrievalIRstemming algorithmsstemmerssuffixeshumanitiesLatin |
spellingShingle | Mark Greengrass Alexander M. Robertson Robyn Schinke Peter Willett Processing morphological variants in searches of Latin text Information Research: An International Electronic Journal natural language text databases query words recall word variants morphology retrieval systems truncation information retrieval IR stemming algorithms stemmers suffixes humanities Latin |
title | Processing morphological variants in searches of Latin text |
title_full | Processing morphological variants in searches of Latin text |
title_fullStr | Processing morphological variants in searches of Latin text |
title_full_unstemmed | Processing morphological variants in searches of Latin text |
title_short | Processing morphological variants in searches of Latin text |
title_sort | processing morphological variants in searches of latin text |
topic | natural language text databases query words recall word variants morphology retrieval systems truncation information retrieval IR stemming algorithms stemmers suffixes humanities Latin |
url | http://informationr.net/ir/2-1/paper10.html |
work_keys_str_mv | AT markgreengrass processingmorphologicalvariantsinsearchesoflatintext AT alexandermrobertson processingmorphologicalvariantsinsearchesoflatintext AT robynschinke processingmorphologicalvariantsinsearchesoflatintext AT peterwillett processingmorphologicalvariantsinsearchesoflatintext |