WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus
This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse docum...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/3/198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850203979724619776 |
|---|---|
| author | Liang-Ching Chen Kuei-Hu Chang Jeng-Fung Hung |
| author_facet | Liang-Ching Chen Kuei-Hu Chang Jeng-Fung Hung |
| author_sort | Liang-Ching Chen |
| collection | DOAJ |
| description | This paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse document frequency (TF-IDF) method face limitations, including the inability to automatically eliminate function words, effectively extract the relevant parameters’ quantitative data, simultaneously consider frequency and range parameters to evaluate the terms’ overall importance, and sort content words at the corpus level. To overcome these limitations, this paper proposes a novel method based on a weighted aggregated sum product assessment (WASPAS) technique. This NLP method integrates the function word elimination method, an NLP machine, and the WASPAS technique to improve the extraction and ranking of content words. The proposed method efficiently extracts quantitative data, simultaneously considers frequency and range parameters to evaluate terms’ substantial importance, and ranks content words at the corpus level, providing a comprehensive overview of term significance. This study employed a target corpus from the Web of Science (WOS), comprising 35 highly cited SDG-related research articles. Compared to competing methods, the results demonstrate that the proposed method outperforms traditional methods in extracting and ranking content words. |
| format | Article |
| id | doaj-art-707b5b68f0d94c0d88becc38e994d3e9 |
| institution | OA Journals |
| issn | 2078-2489 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Information |
| spelling | doaj-art-707b5b68f0d94c0d88becc38e994d3e92025-08-20T02:11:23ZengMDPI AGInformation2078-24892025-03-0116319810.3390/info16030198WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs CorpusLiang-Ching Chen0Kuei-Hu Chang1Jeng-Fung Hung2Department of Foreign Languages, R.O.C. Military Academy, Kaohsiung 830, TaiwanDepartment of Management Sciences, R.O.C. Military Academy, Kaohsiung 830, TaiwanGraduate Institute of Science Education and Environmental Education, National Kaohsiung Normal University, Kaohsiung 824, TaiwanThis paper addresses the challenges in extracting content words within the domains of natural language processing (NLP) and artificial intelligence (AI), using sustainable development goals (SDGs) corpora as verification examples. Traditional corpus-based methods and the term frequency-inverse document frequency (TF-IDF) method face limitations, including the inability to automatically eliminate function words, effectively extract the relevant parameters’ quantitative data, simultaneously consider frequency and range parameters to evaluate the terms’ overall importance, and sort content words at the corpus level. To overcome these limitations, this paper proposes a novel method based on a weighted aggregated sum product assessment (WASPAS) technique. This NLP method integrates the function word elimination method, an NLP machine, and the WASPAS technique to improve the extraction and ranking of content words. The proposed method efficiently extracts quantitative data, simultaneously considers frequency and range parameters to evaluate terms’ substantial importance, and ranks content words at the corpus level, providing a comprehensive overview of term significance. This study employed a target corpus from the Web of Science (WOS), comprising 35 highly cited SDG-related research articles. Compared to competing methods, the results demonstrate that the proposed method outperforms traditional methods in extracting and ranking content words.https://www.mdpi.com/2078-2489/16/3/198content word extractionnatural language processing (NLP)artificial intelligence (AI)corpusthe term frequency-inverse document frequency (TF-IDF) methodthe weighted aggregated sum product assessment (WASPAS) method |
| spellingShingle | Liang-Ching Chen Kuei-Hu Chang Jeng-Fung Hung WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus Information content word extraction natural language processing (NLP) artificial intelligence (AI) corpus the term frequency-inverse document frequency (TF-IDF) method the weighted aggregated sum product assessment (WASPAS) method |
| title | WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus |
| title_full | WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus |
| title_fullStr | WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus |
| title_full_unstemmed | WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus |
| title_short | WASPAS-Based Natural Language Processing Method for Handling Content Words Extraction and Ranking Issues: An Example of SDGs Corpus |
| title_sort | waspas based natural language processing method for handling content words extraction and ranking issues an example of sdgs corpus |
| topic | content word extraction natural language processing (NLP) artificial intelligence (AI) corpus the term frequency-inverse document frequency (TF-IDF) method the weighted aggregated sum product assessment (WASPAS) method |
| url | https://www.mdpi.com/2078-2489/16/3/198 |
| work_keys_str_mv | AT liangchingchen waspasbasednaturallanguageprocessingmethodforhandlingcontentwordsextractionandrankingissuesanexampleofsdgscorpus AT kueihuchang waspasbasednaturallanguageprocessingmethodforhandlingcontentwordsextractionandrankingissuesanexampleofsdgscorpus AT jengfunghung waspasbasednaturallanguageprocessingmethodforhandlingcontentwordsextractionandrankingissuesanexampleofsdgscorpus |