Ranking Assisted Unsupervised Morphological Disambiguation of Turkish

In comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on su...

Full description

Saved in:
Bibliographic Details
Main Authors: Hayri Volkan Agun, Ozkan Aslan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10908819/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850034150180913152
author Hayri Volkan Agun
Ozkan Aslan
author_facet Hayri Volkan Agun
Ozkan Aslan
author_sort Hayri Volkan Agun
collection DOAJ
description In comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on supervised learning models based on the correct morphological analysis of a given phrase. In this study, we propose a ranking method to limit and filter out irrelevant morphological tags from all possible combinations of morphological analyses of a given sentence without supervision. The suggested method selects less ambiguous analyses for statistical aggregation and applies inference through the PageRank algorithm on a densely connected graph. Subsequently, this graph is utilized to develop a voting schema for each test word based on the connections in the test sentence. Experimental evaluations of the proposed methods on three independently and manually annotated test datasets indicate a token accuracy of approximately 80% and an accuracy of around 61% for ambiguous tokens. In all ranking evaluations, the best scores from the PageRank variations significantly outperform those of Self-Attention LSTM and ELMO deep learning models. The training process of PageRank is notably straightforward and efficient, requiring <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula> parameter adjustments, which is considerably fewer than those required by the backpropagation method used in neural network training. Furthermore, to reduce ambiguity in sentences from different genres with scarce samples, the proposed method is easily adaptable.
format Article
id doaj-art-e42dcd7179934f2eb3568d35d17bed64
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-e42dcd7179934f2eb3568d35d17bed642025-08-20T02:57:55ZengIEEEIEEE Access2169-35362025-01-0113419744198310.1109/ACCESS.2025.354730310908819Ranking Assisted Unsupervised Morphological Disambiguation of TurkishHayri Volkan Agun0https://orcid.org/0000-0002-4253-8920Ozkan Aslan1Department of Computer Engineering, Bursa Technical University, Bursa, T&#x00FC;rkiyeDepartment of Computer Engineering, Afyon Kocatepe University, Afyon, T&#x00FC;rkiyeIn comparison to English, Turkish is an agglutinative language with fewer resources. The agglutinative properties of words result in a significant number of morphological analyses, creating uncertainty in morphological disambiguation and syntactic parsing. Traditional approaches typically rely on supervised learning models based on the correct morphological analysis of a given phrase. In this study, we propose a ranking method to limit and filter out irrelevant morphological tags from all possible combinations of morphological analyses of a given sentence without supervision. The suggested method selects less ambiguous analyses for statistical aggregation and applies inference through the PageRank algorithm on a densely connected graph. Subsequently, this graph is utilized to develop a voting schema for each test word based on the connections in the test sentence. Experimental evaluations of the proposed methods on three independently and manually annotated test datasets indicate a token accuracy of approximately 80% and an accuracy of around 61% for ambiguous tokens. In all ranking evaluations, the best scores from the PageRank variations significantly outperform those of Self-Attention LSTM and ELMO deep learning models. The training process of PageRank is notably straightforward and efficient, requiring <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula> parameter adjustments, which is considerably fewer than those required by the backpropagation method used in neural network training. Furthermore, to reduce ambiguity in sentences from different genres with scarce samples, the proposed method is easily adaptable.https://ieeexplore.ieee.org/document/10908819/Morphological disambiguationdeep neural networksfeature engineeringunsupervised learning
spellingShingle Hayri Volkan Agun
Ozkan Aslan
Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
IEEE Access
Morphological disambiguation
deep neural networks
feature engineering
unsupervised learning
title Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_full Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_fullStr Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_full_unstemmed Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_short Ranking Assisted Unsupervised Morphological Disambiguation of Turkish
title_sort ranking assisted unsupervised morphological disambiguation of turkish
topic Morphological disambiguation
deep neural networks
feature engineering
unsupervised learning
url https://ieeexplore.ieee.org/document/10908819/
work_keys_str_mv AT hayrivolkanagun rankingassistedunsupervisedmorphologicaldisambiguationofturkish
AT ozkanaslan rankingassistedunsupervisedmorphologicaldisambiguationofturkish