A Comprehensive Comparative Analysis of Deep Learning Models for Student Performance Prediction in Virtual Learning Environments: Leveraging the OULA Dataset and Advanced Resampling Techniques
Predicting student performance in Virtual Learning Environments (VLEs) has become increasingly important with the growth of online education. Early identification of at-risk students allows timely interventions to improve academic outcomes. This study evaluates the performance of several Deep Learni...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10979810/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Predicting student performance in Virtual Learning Environments (VLEs) has become increasingly important with the growth of online education. Early identification of at-risk students allows timely interventions to improve academic outcomes. This study evaluates the performance of several Deep Learning (DL) models for tabular data, including ResNet, NODE, AutoInt, TabNet, TabTransformer (TT), SAINT, and GatedTabTransformer (GTT). Moreover, it examines the role of resampling techniques, including SMOTE, ROS, ADASYN, RUS, and Tomek Links, in addressing class imbalance. Using the OULA dataset, eight experiments were conducted for binary and multi-class classification tasks, testing different feature combinations: 1) behavioral, 2) demographic and behavioral, 3) academic and behavioral, and 4) demographic, academic, and behavioral. The results indicate that incorporating a comprehensive set of characteristics can significantly enhance the model’s performance, with academic characteristics proving more predictive than demographic characteristics. The SAINT model achieved the highest performance in binary classification (94.33% accuracy), leveraging its ability to capture meaningful yet straightforward feature interactions. For multi-class classification, SAINT again outperformed other models, achieving an accuracy of 73.22% when using the Tomek Links method, excelling in managing complex feature interactions and underrepresented classes such as “Distinction.” Statistical analysis was done using the Friedman aligned ranks test and the Nemenyi post-test to compare how well the models performed based on F1-scores from several experiments. The non-parametric Friedman test revealed significant differences among the models (<inline-formula> <tex-math notation="LaTeX">$p = 0.00013$ </tex-math></inline-formula>). SAINT and AutoInt consistently outperformed the other approaches, while ResNet and TT demonstrated the weakest performance. Post-hoc analysis using the Nemenyi test did not show statistically significant differences among mid-tier models (TabNet, GTT, NODE). A critical difference (CD) further confirmed that SAINT and AutoInt are the most effective architectures for addressing complex, imbalanced educational data. These findings highlight the importance of aligning model selection and resampling techniques with the complexity of the task and the characteristics of the data. |
|---|---|
| ISSN: | 2169-3536 |