LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking

Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic softwar...

Full description

Saved in:
Bibliographic Details
Main Authors: Timothy Dillan, Dhomas Hatta Fudholi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10147827/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850093382754369536
author Timothy Dillan
Dhomas Hatta Fudholi
author_facet Timothy Dillan
Dhomas Hatta Fudholi
author_sort Timothy Dillan
collection DOAJ
description Advancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance.
format Article
id doaj-art-864fc5fc7eb4474f86b2e581ed687d2b
institution DOAJ
issn 2169-3536
language English
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-864fc5fc7eb4474f86b2e581ed687d2b2025-08-20T02:41:56ZengIEEEIEEE Access2169-35362023-01-0111591425916310.1109/ACCESS.2023.328511610147827LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity LinkingTimothy Dillan0https://orcid.org/0000-0002-0506-5503Dhomas Hatta Fudholi1https://orcid.org/0000-0001-9029-0053Department of Informatics, Faculty of Industrial Technology, Universitas Islam Indonesia, Yogyakarta, IndonesiaDepartment of Informatics, Faculty of Industrial Technology, Universitas Islam Indonesia, Yogyakarta, IndonesiaAdvancements in knowledge are pivotal to academic progress, necessitating efficient methods for discovering the state-of-the-art in various fields. Existing approaches, however, are language-specific and lack automation, limiting their efficacy. This study aims to develop a language-agnostic software that streamlines the process of identifying state-of-the-art research across diverse academic topics. The software automatically retrieves articles from multiple databases and preprocesses the content through tokenization, case folding, token cleansing, stopword removal, and lemmatization. Subsequently, a numeric document-phrase matrix is created and analyzed using latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT) to discover and label topics automatically. The study introduces a novel topic-filtering method based on entity linking and filtering model outputs using a knowledge database to ensure topic relevance. The visual representation employs nested bubble and line charts, effectively illustrating current topics, gaps, and research evolution trends. A user survey spread to 52 student researchers assessing the interface, topic relevance, and research output of the developed software, revealed that the interface is user-friendly, easy to navigate, and the presented information is comprehensible. Survey results also indicated that the generated topics are consistent with the processed article content and relevant to the investigated topic. The visualization effectively aids in understanding the state-of-the-art and research map. This study demonstrates that integrating LDA, BERT, and the proposed topic filtering and labeling method yields a robust tool for preliminary research analysis with high precision and relevance.https://ieeexplore.ieee.org/document/10147827/Topic modelingstate-of-the-art discoverydevelopment of knowledgelatent-dirichlet allocationbidirectional encoder representations from transformersentity linking
spellingShingle Timothy Dillan
Dhomas Hatta Fudholi
LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
IEEE Access
Topic modeling
state-of-the-art discovery
development of knowledge
latent-dirichlet allocation
bidirectional encoder representations from transformers
entity linking
title LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
title_full LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
title_fullStr LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
title_full_unstemmed LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
title_short LDAViewer: An Automatic Language-Agnostic System for Discovering State-of-the-Art Topics in Research Using Topic Modeling, Bidirectional Encoder Representations From Transformers, and Entity Linking
title_sort ldaviewer an automatic language agnostic system for discovering state of the art topics in research using topic modeling bidirectional encoder representations from transformers and entity linking
topic Topic modeling
state-of-the-art discovery
development of knowledge
latent-dirichlet allocation
bidirectional encoder representations from transformers
entity linking
url https://ieeexplore.ieee.org/document/10147827/
work_keys_str_mv AT timothydillan ldavieweranautomaticlanguageagnosticsystemfordiscoveringstateofthearttopicsinresearchusingtopicmodelingbidirectionalencoderrepresentationsfromtransformersandentitylinking
AT dhomashattafudholi ldavieweranautomaticlanguageagnosticsystemfordiscoveringstateofthearttopicsinresearchusingtopicmodelingbidirectionalencoderrepresentationsfromtransformersandentitylinking