An Approach to Trustworthy Article Ranking by NLP and Multi-Layered Analysis and Optimization

The rapid growth of scientific publications, coupled with rising retraction rates, has intensified the challenge of identifying trustworthy academic articles. To address this issue, we propose a three-layer ranking system that integrates natural language processing and machine learning techniques fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Chenhao Li, Jiyin Zhang, Weilin Chen, Xiaogang Ma
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/18/7/408
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rapid growth of scientific publications, coupled with rising retraction rates, has intensified the challenge of identifying trustworthy academic articles. To address this issue, we propose a three-layer ranking system that integrates natural language processing and machine learning techniques for relevance and trust assessment. First, we apply BERT-based embeddings to semantically match user queries with article content. Second, a Random Forest classifier is used to eliminate potentially problematic articles, leveraging features such as citation count, Altmetric score, and journal impact factor. Third, a custom ranking function combines relevance and trust indicators to score and sort the remaining articles. Evaluation using 16,052 articles from Retraction Watch and Web of Science datasets shows that our classifier achieves 90% accuracy and 97% recall for retracted articles. Citations emerged as the most influential trust signal (53.26%), followed by Altmetric and impact factors. This multi-layered approach offers a transparent and efficient alternative to conventional ranking algorithms, which can help researchers discover not only relevant but also reliable literature. Our system is adaptable to various domains and represents a promising tool for improving literature search and evaluation in the open science environment.
ISSN:1999-4893