Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data

Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learn...

Full description

Saved in:
Bibliographic Details
Main Authors: Bassey Henshaw, Bhupesh Kumar Mishra, William Sayers, Zeeshan Pervez
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Analytics
Subjects:
Online Access:https://www.mdpi.com/2813-2203/4/1/10
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850093869487620096
author Bassey Henshaw
Bhupesh Kumar Mishra
William Sayers
Zeeshan Pervez
author_facet Bassey Henshaw
Bhupesh Kumar Mishra
William Sayers
Zeeshan Pervez
author_sort Bassey Henshaw
collection DOAJ
description Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques with statistical analytical methodologies. By employing multi-stage analyses alongside machine learning models such as decision trees, random forests and the explainability with SHAP stands for (Shapley Additive exPanations), this study investigates the influence of 21 socioeconomic and demographic variables on graduate salary outcomes. Key variables, including institutional reputation, age at graduation, socioeconomic classification, job qualification requirements, and domicile, emerged as critical determinants, with institutional reputation proving the most significant. Among ML methods, the decision tree achieved a standout with the highest accuracy through rigorous optimisation techniques, including oversampling and undersampling. SHAP highlighted the top 12 influential variables, providing actionable insights into the interplay between individual and systemic factors. Furthermore, the statistical analysis using ANOVA (Analysis of Variance) validated the significance of these variables, revealing intricate interactions that shape graduate salary dynamics. Additionally, domain experts’ opinions are also analysed to authenticate the findings. This research makes a unique contribution by combining qualitative contextual analysis with quantitative methodologies, machine learning explainability and domain experts’ views on addressing gaps in the existing identification of graduate salary predicting components. Additionally, the findings inform policy and educational interventions to reduce wage inequalities and promote equitable career opportunities. Despite limitations, such as the UK-specific dataset and the focus on socioeconomic and demographic variables, this study lays a robust foundation for future research in predictive modelling and graduate outcomes.
format Article
id doaj-art-d439042ec12949d88b95c97295ea2e9f
institution DOAJ
issn 2813-2203
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Analytics
spelling doaj-art-d439042ec12949d88b95c97295ea2e9f2025-08-20T02:41:48ZengMDPI AGAnalytics2813-22032025-03-01411010.3390/analytics4010010Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency DataBassey Henshaw0Bhupesh Kumar Mishra1William Sayers2Zeeshan Pervez3School of Computing and Technology, University of Gloucestershire, Cheltenham GL50 2RH, UKCentre of Excellence for Data Science, Artificial Intelligence and Modelling (DAIM), University of Hull, Cottingham Road, Hull HU6 7RX, UKSchool of Computing and Technology, University of Gloucestershire, Cheltenham GL50 2RH, UKSchool of Engineering, Computing, and Mathematical Sciences, University of Wolverhampton, Wolverhamton WV1 1LY, UKGraduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques with statistical analytical methodologies. By employing multi-stage analyses alongside machine learning models such as decision trees, random forests and the explainability with SHAP stands for (Shapley Additive exPanations), this study investigates the influence of 21 socioeconomic and demographic variables on graduate salary outcomes. Key variables, including institutional reputation, age at graduation, socioeconomic classification, job qualification requirements, and domicile, emerged as critical determinants, with institutional reputation proving the most significant. Among ML methods, the decision tree achieved a standout with the highest accuracy through rigorous optimisation techniques, including oversampling and undersampling. SHAP highlighted the top 12 influential variables, providing actionable insights into the interplay between individual and systemic factors. Furthermore, the statistical analysis using ANOVA (Analysis of Variance) validated the significance of these variables, revealing intricate interactions that shape graduate salary dynamics. Additionally, domain experts’ opinions are also analysed to authenticate the findings. This research makes a unique contribution by combining qualitative contextual analysis with quantitative methodologies, machine learning explainability and domain experts’ views on addressing gaps in the existing identification of graduate salary predicting components. Additionally, the findings inform policy and educational interventions to reduce wage inequalities and promote equitable career opportunities. Despite limitations, such as the UK-specific dataset and the focus on socioeconomic and demographic variables, this study lays a robust foundation for future research in predictive modelling and graduate outcomes.https://www.mdpi.com/2813-2203/4/1/10graduate salarieshigher educationmachine learningsocioeconomic and demographic factorsstatistical analysisSHAP
spellingShingle Bassey Henshaw
Bhupesh Kumar Mishra
William Sayers
Zeeshan Pervez
Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
Analytics
graduate salaries
higher education
machine learning
socioeconomic and demographic factors
statistical analysis
SHAP
title Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
title_full Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
title_fullStr Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
title_full_unstemmed Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
title_short Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data
title_sort unveiling the impact of socioeconomic and demographic factors on graduate salaries a machine learning explanatory analytical approach using higher education statistical agency data
topic graduate salaries
higher education
machine learning
socioeconomic and demographic factors
statistical analysis
SHAP
url https://www.mdpi.com/2813-2203/4/1/10
work_keys_str_mv AT basseyhenshaw unveilingtheimpactofsocioeconomicanddemographicfactorsongraduatesalariesamachinelearningexplanatoryanalyticalapproachusinghighereducationstatisticalagencydata
AT bhupeshkumarmishra unveilingtheimpactofsocioeconomicanddemographicfactorsongraduatesalariesamachinelearningexplanatoryanalyticalapproachusinghighereducationstatisticalagencydata
AT williamsayers unveilingtheimpactofsocioeconomicanddemographicfactorsongraduatesalariesamachinelearningexplanatoryanalyticalapproachusinghighereducationstatisticalagencydata
AT zeeshanpervez unveilingtheimpactofsocioeconomicanddemographicfactorsongraduatesalariesamachinelearningexplanatoryanalyticalapproachusinghighereducationstatisticalagencydata