Encoding polylexical units with TEI Lex-o: A case study

The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital...

Full description

Saved in:

Bibliographic Details
Main Authors:	Toma Tasovac, Ana Salgado, Rute Costa
Format:	Article
Language:	English
Published:	University of Ljubljana Press (Založba Univerze v Ljubljani) 2020-08-01
Series:	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:	TEI lexicography language resources polylexical units interoperability
Online Access:	https://journals.uni-lj.si/slovenscina2/article/view/9157
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850112996993400832
author	Toma Tasovac Ana Salgado Rute Costa
author_facet	Toma Tasovac Ana Salgado Rute Costa
author_sort	Toma Tasovac
collection	DOAJ
description	The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.
format	Article
id	doaj-art-277f3d9cbfaa40169aa1f926ad4da081
institution	OA Journals
issn	2335-2736
language	English
publishDate	2020-08-01
publisher	University of Ljubljana Press (Založba Univerze v Ljubljani)
record_format	Article
series	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
spelling	doaj-art-277f3d9cbfaa40169aa1f926ad4da0812025-08-20T02:37:16ZengUniversity of Ljubljana Press (Založba Univerze v Ljubljani)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362020-08-018210.4312/slo2.0.2020.2.28-57Encoding polylexical units with TEI Lex-o: A case studyToma Tasovac0Ana Salgado1Rute Costa2Belgrade Center for Digital Humanities, SerbiaNew University of Lisbon, CLUNL, PortugalNew University of Lisbon, CLUNL, Portugal The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as <form>–like constructs, whereas the latter becomes <entry>–like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on <gram> to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units. https://journals.uni-lj.si/slovenscina2/article/view/9157TEIlexicographylanguage resourcespolylexical unitsinteroperability
spellingShingle	Toma Tasovac Ana Salgado Rute Costa Encoding polylexical units with TEI Lex-o: A case study Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave TEI lexicography language resources polylexical units interoperability
title	Encoding polylexical units with TEI Lex-o: A case study
title_full	Encoding polylexical units with TEI Lex-o: A case study
title_fullStr	Encoding polylexical units with TEI Lex-o: A case study
title_full_unstemmed	Encoding polylexical units with TEI Lex-o: A case study
title_short	Encoding polylexical units with TEI Lex-o: A case study
title_sort	encoding polylexical units with tei lex o a case study
topic	TEI lexicography language resources polylexical units interoperability
url	https://journals.uni-lj.si/slovenscina2/article/view/9157
work_keys_str_mv	AT tomatasovac encodingpolylexicalunitswithteilexoacasestudy AT anasalgado encodingpolylexicalunitswithteilexoacasestudy AT rutecosta encodingpolylexicalunitswithteilexoacasestudy

Encoding polylexical units with TEI Lex-o: A case study

Similar Items