Ranking Assisted Unsupervised Morphological Disambiguation of Turkish

In comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on su...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hayri Volkan Agun, Ozkan Aslan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Morphological disambiguation deep neural networks feature engineering unsupervised learning
Online Access:	https://ieeexplore.ieee.org/document/10908819/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850034150180913152
author	Hayri Volkan Agun Ozkan Aslan
author_facet	Hayri Volkan Agun Ozkan Aslan
author_sort	Hayri Volkan Agun
collection	DOAJ
description	In comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on supervised learning models based on the correct morphological analysis of a given phrase. In this study, we propose a ranking method to limit and filter out irrelevant morphological tags from all possible combinations of morphological analyses of a given sentence without supervision. The suggested method selects less ambiguous analyses for statistical aggregation and applies inference through the PageRank algorithm on a densely connected graph. Subsequently, this graph is utilized to develop a voting schema for each test word based on the connections in the test sentence. Experimental evaluations of the proposed methods on three independently and manually annotated test datasets indicate a token accuracy of approximately 80% and an accuracy of around 61% for ambiguous tokens. In all ranking evaluations, the best scores from the PageRank variations significantly outperform those of Self-Attention LSTM and ELMO deep learning models. The training process of PageRank is notably straightforward and efficient, requiring <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula> parameter adjustments, which is considerably fewer than those required by the backpropagation method used in neural network training. Furthermore, to reduce ambiguity in sentences from different genres with scarce samples, the proposed method is easily adaptable.
format	Article
id	doaj-art-e42dcd7179934f2eb3568d35d17bed64
institution	DOAJ
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e42dcd7179934f2eb3568d35d17bed642025-08-20T02:57:55ZengIEEEIEEE Access2169-35362025-01-0113419744198310.1109/ACCESS.2025.354730310908819Ranking Assisted Unsupervised Morphological Disambiguation of TurkishHayri Volkan Agun0https://orcid.org/0000-0002-4253-8920Ozkan Aslan1Department of Computer Engineering, Bursa Technical University, Bursa, TürkiyeDepartment of Computer Engineering, Afyon Kocatepe University, Afyon, TürkiyeIn comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on supervised learning models based on the correct morphological analysis of a given phrase. In this study, we propose a ranking method to limit and filter out irrelevant morphological tags from all possible combinations of morphological analyses of a given sentence without supervision. The suggested method selects less ambiguous analyses for statistical aggregation and applies inference through the PageRank algorithm on a densely connected graph. Subsequently, this graph is utilized to develop a voting schema for each test word based on the connections in the test sentence. Experimental evaluations of the proposed methods on three independently and manually annotated test datasets indicate a token accuracy of approximately 80% and an accuracy of around 61% for ambiguous tokens. In all ranking evaluations, the best scores from the PageRank variations significantly outperform those of Self-Attention LSTM and ELMO deep learning models. The training process of PageRank is notably straightforward and efficient, requiring <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula> parameter adjustments, which is considerably fewer than those required by the backpropagation method used in neural network training. Furthermore, to reduce ambiguity in sentences from different genres with scarce samples, the proposed method is easily adaptable.https://ieeexplore.ieee.org/document/10908819/Morphological disambiguationdeep neural networksfeature engineeringunsupervised learning
spellingShingle	Hayri Volkan Agun Ozkan Aslan Ranking Assisted Unsupervised Morphological Disambiguation of Turkish IEEE Access Morphological disambiguation deep neural networks feature engineering unsupervised learning
title	Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_full	Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_fullStr	Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_full_unstemmed	Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_short	Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_sort	ranking assisted unsupervised morphological disambiguation of turkish
topic	Morphological disambiguation deep neural networks feature engineering unsupervised learning
url	https://ieeexplore.ieee.org/document/10908819/
work_keys_str_mv	AT hayrivolkanagun rankingassistedunsupervisedmorphologicaldisambiguationofturkish AT ozkanaslan rankingassistedunsupervisedmorphologicaldisambiguationofturkish

Ranking Assisted Unsupervised Morphological Disambiguation of Turkish

Similar Items