Student dropout prediction through machine learning optimization: insights from moodle log data

Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student in...

Full description

Saved in:
Bibliographic Details
Main Authors: Markson Rebelo Marcolino, Thiago Reis Porto, Tiago Thompsen Primo, Rafael Targino, Vinicius Ramos, Emanuel Marques Queiroga, Roberto Munoz, Cristian Cechinel
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-93918-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850054341130452992
author Markson Rebelo Marcolino
Thiago Reis Porto
Tiago Thompsen Primo
Rafael Targino
Vinicius Ramos
Emanuel Marques Queiroga
Roberto Munoz
Cristian Cechinel
author_facet Markson Rebelo Marcolino
Thiago Reis Porto
Tiago Thompsen Primo
Rafael Targino
Vinicius Ramos
Emanuel Marques Queiroga
Roberto Munoz
Cristian Cechinel
author_sort Markson Rebelo Marcolino
collection DOAJ
description Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student interactions and enrollment patterns, presenting opportunities for predictive analytics. This study seeks to advance the field of dropout and failure prediction through the application of artificial intelligence with machine learning methodologies. In particular, we employed the CatBoost algorithm, trained on student activity logs from the Moodle platform. To mitigate the challenges posed by a limited and imbalanced dataset, we employed sophisticated data balancing techniques, such as Adaptive Synthetic Sampling, and conducted multi-objective hyperparameter optimization using the Non-dominated Sorting Genetic Algorithm II. We compared models trained on weekly log data against a single model trained on all weeks’ data. The proposed model trained with all weeks’ data demonstrated superior performance, showing significant improvements in F1-scores and recall, particularly for the minority class of at-risk students. For example, the model got an average F1-score across multiple weeks of approximately 0.8 in the holdout test. These findings underscore the potential of targeted machine learning approaches to facilitate early identification of at-risk students, thereby enabling timely interventions and improving educational outcomes.
format Article
id doaj-art-d73b043389d748b3bf7e6f4e53e44435
institution DOAJ
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-d73b043389d748b3bf7e6f4e53e444352025-08-20T02:52:17ZengNature PortfolioScientific Reports2045-23222025-03-0115111610.1038/s41598-025-93918-1Student dropout prediction through machine learning optimization: insights from moodle log dataMarkson Rebelo Marcolino0Thiago Reis Porto1Tiago Thompsen Primo2Rafael Targino3Vinicius Ramos4Emanuel Marques Queiroga5Roberto Munoz6Cristian Cechinel7Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL)Centro de Desenvolvimento Tecnológico (CDTec), Universidade Federal de Pelotas (UFPEL)Centro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Centro Tecnológico, Universidade Federal de Santa Catarina (UFSC)Instituto Federal Sul-rio-grandense (IFSUL)Escuela de Ingeniería Informática, Universidad de ValparaísoCentro de Ciências, Tecnologias e Saúde, Universidade Federal de Santa Catarina (UFSC)Abstract Student attrition and academic failure remain pervasive challenges in education, often occurring at substantial rates and posing considerable difficulties for timely identification and intervention. Learning management systems such as Moodle generate extensive datasets reflecting student interactions and enrollment patterns, presenting opportunities for predictive analytics. This study seeks to advance the field of dropout and failure prediction through the application of artificial intelligence with machine learning methodologies. In particular, we employed the CatBoost algorithm, trained on student activity logs from the Moodle platform. To mitigate the challenges posed by a limited and imbalanced dataset, we employed sophisticated data balancing techniques, such as Adaptive Synthetic Sampling, and conducted multi-objective hyperparameter optimization using the Non-dominated Sorting Genetic Algorithm II. We compared models trained on weekly log data against a single model trained on all weeks’ data. The proposed model trained with all weeks’ data demonstrated superior performance, showing significant improvements in F1-scores and recall, particularly for the minority class of at-risk students. For example, the model got an average F1-score across multiple weeks of approximately 0.8 in the holdout test. These findings underscore the potential of targeted machine learning approaches to facilitate early identification of at-risk students, thereby enabling timely interventions and improving educational outcomes.https://doi.org/10.1038/s41598-025-93918-1Student dropout predictionMachine learning in educationNSGA-IICatBoostMoodle logs
spellingShingle Markson Rebelo Marcolino
Thiago Reis Porto
Tiago Thompsen Primo
Rafael Targino
Vinicius Ramos
Emanuel Marques Queiroga
Roberto Munoz
Cristian Cechinel
Student dropout prediction through machine learning optimization: insights from moodle log data
Scientific Reports
Student dropout prediction
Machine learning in education
NSGA-II
CatBoost
Moodle logs
title Student dropout prediction through machine learning optimization: insights from moodle log data
title_full Student dropout prediction through machine learning optimization: insights from moodle log data
title_fullStr Student dropout prediction through machine learning optimization: insights from moodle log data
title_full_unstemmed Student dropout prediction through machine learning optimization: insights from moodle log data
title_short Student dropout prediction through machine learning optimization: insights from moodle log data
title_sort student dropout prediction through machine learning optimization insights from moodle log data
topic Student dropout prediction
Machine learning in education
NSGA-II
CatBoost
Moodle logs
url https://doi.org/10.1038/s41598-025-93918-1
work_keys_str_mv AT marksonrebelomarcolino studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT thiagoreisporto studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT tiagothompsenprimo studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT rafaeltargino studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT viniciusramos studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT emanuelmarquesqueiroga studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT robertomunoz studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata
AT cristiancechinel studentdropoutpredictionthroughmachinelearningoptimizationinsightsfrommoodlelogdata