Does the Choice of Topic Modeling Technique Impact the Interpretation of Aviation Incident Reports? A Methodological Assessment

This study presents a comparative analysis of four topic modeling techniques —Latent Dirichlet Allocation (LDA), Bidirectional Encoder Representations from Transformers (BERT), Probabilistic Latent Semantic Analysis (pLSA), and Non-negative Matrix Factorization (NMF)—applied to aviation safety repor...

Full description

Saved in:
Bibliographic Details
Main Authors: Aziida Nanyonga, Keith Joiner, Ugur Turhan, Graham Wild
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Technologies
Subjects:
Online Access:https://www.mdpi.com/2227-7080/13/5/209
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study presents a comparative analysis of four topic modeling techniques —Latent Dirichlet Allocation (LDA), Bidirectional Encoder Representations from Transformers (BERT), Probabilistic Latent Semantic Analysis (pLSA), and Non-negative Matrix Factorization (NMF)—applied to aviation safety reports from the ATSB dataset spanning 2013–2023. The evaluation focuses on coherence, interpretability, generalization, computational efficiency, and scalability. The results indicate that NMF achieves the highest coherence score (0.7987), demonstrating its effectiveness in extracting well-defined topics from structured narratives. pLSA performs competitively (coherence: 0.7634) but lacks the scalability of NMF. LDA and BERTopic, while effective in generalization (perplexity: −6.471 and −4.638, respectively), struggle with coherence due to their probabilistic nature and reliance on contextual embeddings. A preliminary expert review by two aviation safety specialists found that topics generated by the NMF model were interpretable and aligned well with domain knowledge, reinforcing its potential suitability for such aviation safety analysis. Future research should explore new hybrid modeling approaches and real-time applications to enhance aviation safety analysis further. The study contributes to advancing automated safety monitoring in the aviation industry by refining the most appropriate topic modeling techniques.
ISSN:2227-7080