Processing morphological variants in searches of Latin text

A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user...

Full description

Saved in:
Bibliographic Details
Main Authors: Mark Greengrass, Alexander M. Robertson, Robyn Schinke, Peter Willett
Format: Article
Language:English
Published: University of Borås 1996-01-01
Series:Information Research: An International Electronic Journal
Subjects:
Online Access:http://informationr.net/ir/2-1/paper10.html
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832569870903934976
author Mark Greengrass
Alexander M. Robertson
Robyn Schinke
Peter Willett
author_facet Mark Greengrass
Alexander M. Robertson
Robyn Schinke
Peter Willett
author_sort Mark Greengrass
collection DOAJ
description A characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user-controlled right-hand (and occasionally left-hand) truncation to allow the retrieval of all words with the same root. A stemming algorithm, or stemmer, is a computational procedure that reduces all words with the same root to a single form by stripping the root of its derivational and inflectional affixes. In most cases, only suffixes are stripped so that a stemmer provides an automatic equivalent of manual, right-hand truncation. Thus far, most work on stemmers has focused on present-day languages, but the increasing user of computers in the humanities has resulted in a need for comparable tools to facilitate searching in historical text databases. This paper summarises some of the initial results of a project here in Sheffield to develop such tools for databases of Latin text.
format Article
id doaj-art-6c019c74aacb4a27865a221562b2fe01
institution Kabale University
issn 1368-1613
language English
publishDate 1996-01-01
publisher University of Borås
record_format Article
series Information Research: An International Electronic Journal
spelling doaj-art-6c019c74aacb4a27865a221562b2fe012025-02-02T19:08:02ZengUniversity of BoråsInformation Research: An International Electronic Journal1368-16131996-01-012110Processing morphological variants in searches of Latin textMark GreengrassAlexander M. RobertsonRobyn SchinkePeter WillettA characteristic of natural-language text databases is that a user must be able to specify all of the variant forms of each query word if high recall is to be achieved. The most common type of word variants are those arising from morphology and thus most retrieval systems provide facilities for user-controlled right-hand (and occasionally left-hand) truncation to allow the retrieval of all words with the same root. A stemming algorithm, or stemmer, is a computational procedure that reduces all words with the same root to a single form by stripping the root of its derivational and inflectional affixes. In most cases, only suffixes are stripped so that a stemmer provides an automatic equivalent of manual, right-hand truncation. Thus far, most work on stemmers has focused on present-day languages, but the increasing user of computers in the humanities has resulted in a need for comparable tools to facilitate searching in historical text databases. This paper summarises some of the initial results of a project here in Sheffield to develop such tools for databases of Latin text.http://informationr.net/ir/2-1/paper10.htmlnatural languagetext databasesquery wordsrecallword variantsmorphologyretrieval systemstruncationinformation retrievalIRstemming algorithmsstemmerssuffixeshumanitiesLatin
spellingShingle Mark Greengrass
Alexander M. Robertson
Robyn Schinke
Peter Willett
Processing morphological variants in searches of Latin text
Information Research: An International Electronic Journal
natural language
text databases
query words
recall
word variants
morphology
retrieval systems
truncation
information retrieval
IR
stemming algorithms
stemmers
suffixes
humanities
Latin
title Processing morphological variants in searches of Latin text
title_full Processing morphological variants in searches of Latin text
title_fullStr Processing morphological variants in searches of Latin text
title_full_unstemmed Processing morphological variants in searches of Latin text
title_short Processing morphological variants in searches of Latin text
title_sort processing morphological variants in searches of latin text
topic natural language
text databases
query words
recall
word variants
morphology
retrieval systems
truncation
information retrieval
IR
stemming algorithms
stemmers
suffixes
humanities
Latin
url http://informationr.net/ir/2-1/paper10.html
work_keys_str_mv AT markgreengrass processingmorphologicalvariantsinsearchesoflatintext
AT alexandermrobertson processingmorphologicalvariantsinsearchesoflatintext
AT robynschinke processingmorphologicalvariantsinsearchesoflatintext
AT peterwillett processingmorphologicalvariantsinsearchesoflatintext