A three-stage machine learning and inference approach for educational data
Abstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire vari...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-89394-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850258886698729472 |
|---|---|
| author | Ting Da |
| author_facet | Ting Da |
| author_sort | Ting Da |
| collection | DOAJ |
| description | Abstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a “post-double-selection” process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question. |
| format | Article |
| id | doaj-art-e1f4066afacb4a068d95aebd842ffb3c |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-e1f4066afacb4a068d95aebd842ffb3c2025-08-20T01:56:01ZengNature PortfolioScientific Reports2045-23222025-04-0115112210.1038/s41598-025-89394-2A three-stage machine learning and inference approach for educational dataTing Da0National Engineering Research Center of Cyberlearning and Intelligent Technology, Beijing Normal UniversityAbstract A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a “post-double-selection” process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question.https://doi.org/10.1038/s41598-025-89394-2Machine learningCausal inferenceOLS regressionInstrumental variable (IV)LASSO |
| spellingShingle | Ting Da A three-stage machine learning and inference approach for educational data Scientific Reports Machine learning Causal inference OLS regression Instrumental variable (IV) LASSO |
| title | A three-stage machine learning and inference approach for educational data |
| title_full | A three-stage machine learning and inference approach for educational data |
| title_fullStr | A three-stage machine learning and inference approach for educational data |
| title_full_unstemmed | A three-stage machine learning and inference approach for educational data |
| title_short | A three-stage machine learning and inference approach for educational data |
| title_sort | three stage machine learning and inference approach for educational data |
| topic | Machine learning Causal inference OLS regression Instrumental variable (IV) LASSO |
| url | https://doi.org/10.1038/s41598-025-89394-2 |
| work_keys_str_mv | AT tingda athreestagemachinelearningandinferenceapproachforeducationaldata AT tingda threestagemachinelearningandinferenceapproachforeducationaldata |