A three-stage machine learning and inference approach for educational data

Abstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire vari...

Full description

Saved in:
Bibliographic Details
Main Author: Ting Da
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-89394-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850258886698729472
author Ting Da
author_facet Ting Da
author_sort Ting Da
collection DOAJ
description Abstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a “post-double-selection” process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question.
format Article
id doaj-art-e1f4066afacb4a068d95aebd842ffb3c
institution OA Journals
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-e1f4066afacb4a068d95aebd842ffb3c2025-08-20T01:56:01ZengNature PortfolioScientific Reports2045-23222025-04-0115112210.1038/s41598-025-89394-2A three-stage machine learning and inference approach for educational dataTing Da0National Engineering Research Center of Cyberlearning and Intelligent Technology, Beijing Normal UniversityAbstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a “post-double-selection” process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question.https://doi.org/10.1038/s41598-025-89394-2Machine learningCausal inferenceOLS regressionInstrumental variable (IV)LASSO
spellingShingle Ting Da
A three-stage machine learning and inference approach for educational data
Scientific Reports
Machine learning
Causal inference
OLS regression
Instrumental variable (IV)
LASSO
title A three-stage machine learning and inference approach for educational data
title_full A three-stage machine learning and inference approach for educational data
title_fullStr A three-stage machine learning and inference approach for educational data
title_full_unstemmed A three-stage machine learning and inference approach for educational data
title_short A three-stage machine learning and inference approach for educational data
title_sort three stage machine learning and inference approach for educational data
topic Machine learning
Causal inference
OLS regression
Instrumental variable (IV)
LASSO
url https://doi.org/10.1038/s41598-025-89394-2
work_keys_str_mv AT tingda athreestagemachinelearningandinferenceapproachforeducationaldata
AT tingda threestagemachinelearningandinferenceapproachforeducationaldata