Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums

Leveraging discussion forums as a medium for information exchange has led to a surge in data, making topic clustering in these platforms essential for understanding user interests, preferences, and concerns. This study introduces an innovative methodology for topic clustering by combining text embed...

Full description

Saved in:
Bibliographic Details
Main Authors: Ibrahim Bouabdallaoui, Fatima Guerouate, Mohammed Sbihi
Format: Article
Language:English
Published: Ediciones Universidad de Salamanca 2024-08-01
Series:Advances in Distributed Computing and Artificial Intelligence Journal
Subjects:
Online Access:https://revistas.usal.es/cinco/index.php/2255-2863/article/view/31448
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590603154620416
author Ibrahim Bouabdallaoui
Fatima Guerouate
Mohammed Sbihi
author_facet Ibrahim Bouabdallaoui
Fatima Guerouate
Mohammed Sbihi
author_sort Ibrahim Bouabdallaoui
collection DOAJ
description Leveraging discussion forums as a medium for information exchange has led to a surge in data, making topic clustering in these platforms essential for understanding user interests, preferences, and concerns. This study introduces an innovative methodology for topic clustering by combining text embedding techniques—Latent Dirichlet Allocation (LDA) and BERT—trained on a singular autoencoder. Additionally, it proposes an amalgamation of K-Means and Genetic Algorithms for clustering topics within triadic discussion forum threads. The proposed technique begins with a preprocessing stage to clean and tokenize textual data, which is then transformed into a vector representation using the hybrid text embedding method. Subsequently, the K-Means algorithm clusters these vectorized data points, and Genetic Algorithms optimize the parameters of the K-Means clustering. We assess the efficacy of our approach by computing cosine similarities between topics and comparing performance against coherence and graph visualization. The results confirm that the hybrid text embedding methodology, coupled with evolutionary algorithms, enhances the quality of topic clustering across various discussion forum themes. This investigation contributes significantly to the development of effective methods for clustering discussion forums, with potential applications in diverse domains, including social media analysis, online education, and customer response analysis.
format Article
id doaj-art-53fca1a468ee44fab625d87d4862dd92
institution Kabale University
issn 2255-2863
language English
publishDate 2024-08-01
publisher Ediciones Universidad de Salamanca
record_format Article
series Advances in Distributed Computing and Artificial Intelligence Journal
spelling doaj-art-53fca1a468ee44fab625d87d4862dd922025-01-23T11:25:18ZengEdiciones Universidad de SalamancaAdvances in Distributed Computing and Artificial Intelligence Journal2255-28632024-08-0113e31448e3144810.14201/adcaij.3144836927Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion ForumsIbrahim Bouabdallaoui0Fatima Guerouate1Mohammed Sbihi2LASTIMI Laboratory EST Salé, Mohammed V University in Rabat, Avenue Prince Héritier, Salé, MoroccoLASTIMI Laboratory EST Salé, Mohammed V University in Rabat, Avenue Prince Héritier, Salé, MoroccoLASTIMI Laboratory EST Salé, Mohammed V University in Rabat, Avenue Prince Héritier, Salé, MoroccoLeveraging discussion forums as a medium for information exchange has led to a surge in data, making topic clustering in these platforms essential for understanding user interests, preferences, and concerns. This study introduces an innovative methodology for topic clustering by combining text embedding techniques—Latent Dirichlet Allocation (LDA) and BERT—trained on a singular autoencoder. Additionally, it proposes an amalgamation of K-Means and Genetic Algorithms for clustering topics within triadic discussion forum threads. The proposed technique begins with a preprocessing stage to clean and tokenize textual data, which is then transformed into a vector representation using the hybrid text embedding method. Subsequently, the K-Means algorithm clusters these vectorized data points, and Genetic Algorithms optimize the parameters of the K-Means clustering. We assess the efficacy of our approach by computing cosine similarities between topics and comparing performance against coherence and graph visualization. The results confirm that the hybrid text embedding methodology, coupled with evolutionary algorithms, enhances the quality of topic clustering across various discussion forum themes. This investigation contributes significantly to the development of effective methods for clustering discussion forums, with potential applications in diverse domains, including social media analysis, online education, and customer response analysis.https://revistas.usal.es/cinco/index.php/2255-2863/article/view/31448ldabertk-meansgenetic algorithmsforum analysis
spellingShingle Ibrahim Bouabdallaoui
Fatima Guerouate
Mohammed Sbihi
Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
Advances in Distributed Computing and Artificial Intelligence Journal
lda
bert
k-means
genetic algorithms
forum analysis
title Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
title_full Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
title_fullStr Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
title_full_unstemmed Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
title_short Hybrid Text Embedding and Evolutionary Algorithm Approach for Topic Clustering in Online Discussion Forums
title_sort hybrid text embedding and evolutionary algorithm approach for topic clustering in online discussion forums
topic lda
bert
k-means
genetic algorithms
forum analysis
url https://revistas.usal.es/cinco/index.php/2255-2863/article/view/31448
work_keys_str_mv AT ibrahimbouabdallaoui hybridtextembeddingandevolutionaryalgorithmapproachfortopicclusteringinonlinediscussionforums
AT fatimaguerouate hybridtextembeddingandevolutionaryalgorithmapproachfortopicclusteringinonlinediscussionforums
AT mohammedsbihi hybridtextembeddingandevolutionaryalgorithmapproachfortopicclusteringinonlinediscussionforums