Student dropout prediction through machine learning optimization: insights from moodle log data
Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student in...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-93918-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850054341130452992 |
|---|---|
| author | Markson Rebelo Marcolino Thiago Reis Porto Tiago Thompsen Primo Rafael Targino Vinicius Ramos Emanuel Marques Queiroga Roberto Munoz Cristian Cechinel |
| author_facet | Markson Rebelo Marcolino Thiago Reis Porto Tiago Thompsen Primo Rafael Targino Vinicius Ramos Emanuel Marques Queiroga Roberto Munoz Cristian Cechinel |
| author_sort | Markson Rebelo Marcolino |
| collection | DOAJ |
| description | Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student interactions and enrollment patterns, presenting opportunities for predictive analytics. This study seeks to advance the field of dropout and failure prediction through the application of artificial intelligence with machine learning methodologies. In particular, we employed the CatBoost algorithm, trained on student activity logs from the Moodle platform. To mitigate the challenges posed by a limited and imbalanced dataset, we employed sophisticated data balancing techniques, such as Adaptive Synthetic Sampling, and conducted multi-objective hyperparameter optimization using the Non-dominated Sorting Genetic Algorithm II. We compared models trained on weekly log data against a single model trained on all weeks’ data. The proposed model trained with all weeks’ data demonstrated superior performance, showing significant improvements in F1-scores and recall, particularly for the minority class of at-risk students. For example, the model got an average F1-score across multiple weeks of approximately 0.8 in the holdout test. These findings underscore the potential of targeted machine learning approaches to facilitate early identification of at-risk students, thereby enabling timely interventions and improving educational outcomes. |
| format | Article |
| id | doaj-art-d73b043389d748b3bf7e6f4e53e44435 |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-d73b043389d748b3bf7e6f4e53e444352025-08-20T02:52:17ZengNature PortfolioScientific Reports2045-23222025-03-0115111610.1038/s41598-025-93918-1Student dropout prediction through machine learning optimization: insights from moodle log dataMarkson Rebelo Marcolino0Thiago Reis Porto1Tiago Thompsen Primo2Rafael Targino3Vinicius Ramos4Emanuel Marques Queiroga5Roberto Munoz6Cristian Cechinel7Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL)Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL)Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Centro Tecnológico, Universidade Federal de Santa Catarina (UFSC)Instituto Federal Sul-rio-grandense (IFSUL)Escuela de Ingeniería Informática, Universidad de ValparaísoCentro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student interactions and enrollment patterns, presenting opportunities for predictive analytics. This study seeks to advance the field of dropout and failure prediction through the application of artificial intelligence with machine learning methodologies. In particular, we employed the CatBoost algorithm, trained on student activity logs from the Moodle platform. To mitigate the challenges posed by a limited and imbalanced dataset, we employed sophisticated data balancing techniques, such as Adaptive Synthetic Sampling, and conducted multi-objective hyperparameter optimization using the Non-dominated Sorting Genetic Algorithm II. We compared models trained on weekly log data against a single model trained on all weeks’ data. The proposed model trained with all weeks’ data demonstrated superior performance, showing significant improvements in F1-scores and recall, particularly for the minority class of at-risk students. For example, the model got an average F1-score across multiple weeks of approximately 0.8 in the holdout test. These findings underscore the potential of targeted machine learning approaches to facilitate early identification of at-risk students, thereby enabling timely interventions and improving educational outcomes.https://doi.org/10.1038/s41598-025-93918-1Student dropout predictionMachine learning in educationNSGA-IICatBoostMoodle logs |
| spellingShingle | Markson Rebelo Marcolino Thiago Reis Porto Tiago Thompsen Primo Rafael Targino Vinicius Ramos Emanuel Marques Queiroga Roberto Munoz Cristian Cechinel Student dropout prediction through machine learning optimization: insights from moodle log data Scientific Reports Student dropout prediction Machine learning in education NSGA-II CatBoost Moodle logs |
| title | Student dropout prediction through machine learning optimization: insights from moodle log data |
| title_full | Student dropout prediction through machine learning optimization: insights from moodle log data |
| title_fullStr | Student dropout prediction through machine learning optimization: insights from moodle log data |
| title_full_unstemmed | Student dropout prediction through machine learning optimization: insights from moodle log data |
| title_short | Student dropout prediction through machine learning optimization: insights from moodle log data |
| title_sort | student dropout prediction through machine learning optimization insights from moodle log data |
| topic | Student dropout prediction Machine learning in education NSGA-II CatBoost Moodle logs |
| url | https://doi.org/10.1038/s41598-025-93918-1 |
| work_keys_str_mv | AT marksonrebelomarcolino studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT thiagoreisporto studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT tiagothompsenprimo studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT rafaeltargino studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT viniciusramos studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT emanuelmarquesqueiroga studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT robertomunoz studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata AT cristiancechinel studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata |