Accurate multi-category student performance forecasting at early stages of online education using neural networks
Abstract The ability to accurately predict and analyze student performance in online education, both at the outset and throughout the semester, is vital. Most of the published studies focus on binary classification (Fail or Pass) but there is still a significant research shortcoming in predicting pe...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-00256-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract The ability to accurately predict and analyze student performance in online education, both at the outset and throughout the semester, is vital. Most of the published studies focus on binary classification (Fail or Pass) but there is still a significant research shortcoming in predicting performance of students across multiple categories. This study introduces a novel neural network-based approach capable of accurately predicting student performance and identifying vulnerable students at early stages of the online courses. The open university learning analytics (OULA) dataset is employed to develop and test the proposed model, which predicts outcomes in Distinction, Fail, Pass, and Withdrawn categories. The OULA dataset is preprocessed to extract features from demographic data, assessment data, and clickstream interactions within a virtual learning environment (VLE). Novel features engineering has been utilized to predict students’ performance across multiple categories at early stages of courses. Specially, students’ VLE interactions are aggregated by total clicks to represent daily engagement and assess online activity. Comparative simulations indicate that the proposed model significantly outperforms existing baseline models including artificial neural network long short-term memory (ANN-LSTM), random forest (RF) ‘gini’, RF ‘entropy’ and deep feed forward neural network (DFFNN) in terms of accuracy, precision, recall, and F1-score. The results indicate that the prediction accuracy of the proposed method is about $$25\%$$ more than the existing state-of-the-art methods. Furthermore, compared to existing methodologies, the model demonstrates superior predictive capability across temporal course progression, achieving superior accuracy even at the initial $$20\%$$ phase of course completion. |
|---|---|
| ISSN: | 2045-2322 |