WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus
This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse docum...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/3/198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse document frequency (TF-IDF) method face limitations, including the inability to automatically eliminate function words, effectively extract the relevant parameters’ quantitative data, simultaneously consider frequency and range parameters to evaluate the terms’ overall importance, and sort content words at the corpus level. To overcome these limitations, this paper proposes a novel method based on a weighted aggregated sum product assessment (WASPAS) technique. This NLP method integrates the function word elimination method, an NLP machine, and the WASPAS technique to improve the extraction and ranking of content words. The proposed method efficiently extracts quantitative data, simultaneously considers frequency and range parameters to evaluate terms’ substantial importance, and ranks content words at the corpus level, providing a comprehensive overview of term significance. This study employed a target corpus from the Web of Science (WOS), comprising 35 highly cited SDG-related research articles. Compared to competing methods, the results demonstrate that the proposed method outperforms traditional methods in extracting and ranking content words. |
|---|---|
| ISSN: | 2078-2489 |