LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic softwar...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2023-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10147827/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850093382754369536 |
|---|---|
| author | Timothy Dillan Dhomas Hatta Fudholi |
| author_facet | Timothy Dillan Dhomas Hatta Fudholi |
| author_sort | Timothy Dillan |
| collection | DOAJ |
| description | Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance. |
| format | Article |
| id | doaj-art-864fc5fc7eb4474f86b2e581ed687d2b |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2023-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-864fc5fc7eb4474f86b2e581ed687d2b2025-08-20T02:41:56ZengIEEEIEEE Access2169-35362023-01-0111591425916310.1109/ACCESS.2023.328511610147827LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity LinkingTimothy Dillan0https://orcid.org/0000-0002-0506-5503Dhomas Hatta Fudholi1https://orcid.org/0000-0001-9029-0053Department of Informatics, Faculty of Industrial Technology, Universitas Islam Indonesia, Yogyakarta, IndonesiaDepartment of Informatics, Faculty of Industrial Technology, Universitas Islam Indonesia, Yogyakarta, IndonesiaAdvancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance.https://ieeexplore.ieee.org/document/10147827/Topic modelingstate-of-the-art discoverydevelopment of knowledgelatent-dirichlet allocationbidirectional encoder representations from transformersentity linking |
| spellingShingle | Timothy Dillan Dhomas Hatta Fudholi LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking IEEE Access Topic modeling state-of-the-art discovery development of knowledge latent-dirichlet allocation bidirectional encoder representations from transformers entity linking |
| title | LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking |
| title_full | LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking |
| title_fullStr | LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking |
| title_full_unstemmed | LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking |
| title_short | LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking |
| title_sort | ldaviewer an automatic language agnostic system for discovering state of the art topics in research using topic modeling bidirectional encoder representations from transformers and entity linking |
| topic | Topic modeling state-of-the-art discovery development of knowledge latent-dirichlet allocation bidirectional encoder representations from transformers entity linking |
| url | https://ieeexplore.ieee.org/document/10147827/ |
| work_keys_str_mv | AT timothydillan ldavieweranautomaticlanguageagnosticsystemfordiscoveringstateofthearttopicsinresearchusingtopicmodelingbidirectionalencoderrepresentationsfromtransformersandentitylinking AT dhomashattafudholi ldavieweranautomaticlanguageagnosticsystemfordiscoveringstateofthearttopicsinresearchusingtopicmodelingbidirectionalencoderrepresentationsfromtransformersandentitylinking |