Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis

Absenteeism at work, possibly leading to productivity loss in business, is related to various psychological, social, and economic factors. Since predicting absenteeism is involved with complex associations of such factors, appropriately utilizing machine learning algorithms is required in the analys...

Full description

Saved in:
Bibliographic Details
Main Authors: Donggeun Kim, Jai Woo Lee
Format: Article
Language:English
Published: SAGE Publishing 2025-04-01
Series:SAGE Open
Online Access:https://doi.org/10.1177/21582440251336019
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849314681734299648
author Donggeun Kim
Jai Woo Lee
author_facet Donggeun Kim
Jai Woo Lee
author_sort Donggeun Kim
collection DOAJ
description Absenteeism at work, possibly leading to productivity loss in business, is related to various psychological, social, and economic factors. Since predicting absenteeism is involved with complex associations of such factors, appropriately utilizing machine learning algorithms is required in the analysis. Statistical pre-processing and applications of machine learning methods have developed the comprehensive analysis of massive social data for absenteeism. The aim of this study is to develop a quantitative approach to identify the associations of factors and classify the absenteeism by including the effect of factors in the high-dimensional data. This approach implements association analysis including odds ratio test and network analysis, and supervised learning with imbalanced classification with random forest, application of principal component analysis and penalized regression methods. The dataset in this study includes records of various types of absenteeism at workplace from July 2007 to July 2010 in Brazil. Our study shows that there exist strongly interacting factors and that specific factors are strongly associated with absenteeism. The proposed method is validated on publicly available data sets using random forest and penalized regression with k-fold cross validation in order to strengthen better generalizability. One of major findings in this study is to elucidate the associations of factors affecting absenteeism. Application to similarly structured social data improves the understanding of the complex interplay between social factors and absenteeism that are important for people analytics which can help organizations resolve management difficulties.
format Article
id doaj-art-093cc10db2854cba815528abd64cbd4b
institution Kabale University
issn 2158-2440
language English
publishDate 2025-04-01
publisher SAGE Publishing
record_format Article
series SAGE Open
spelling doaj-art-093cc10db2854cba815528abd64cbd4b2025-08-20T03:52:24ZengSAGE PublishingSAGE Open2158-24402025-04-011510.1177/21582440251336019Predicting Absenteeism at Workplace Using Machine Learning and Network AnalysisDonggeun Kim0Jai Woo Lee1Department of Big Data Science, College of Public Policy, Korea University, Sejong, Republic of KoreaDepartment of Big Data Science, College of Public Policy, Korea University, Sejong, Republic of KoreaAbsenteeism at work, possibly leading to productivity loss in business, is related to various psychological, social, and economic factors. Since predicting absenteeism is involved with complex associations of such factors, appropriately utilizing machine learning algorithms is required in the analysis. Statistical pre-processing and applications of machine learning methods have developed the comprehensive analysis of massive social data for absenteeism. The aim of this study is to develop a quantitative approach to identify the associations of factors and classify the absenteeism by including the effect of factors in the high-dimensional data. This approach implements association analysis including odds ratio test and network analysis, and supervised learning with imbalanced classification with random forest, application of principal component analysis and penalized regression methods. The dataset in this study includes records of various types of absenteeism at workplace from July 2007 to July 2010 in Brazil. Our study shows that there exist strongly interacting factors and that specific factors are strongly associated with absenteeism. The proposed method is validated on publicly available data sets using random forest and penalized regression with k-fold cross validation in order to strengthen better generalizability. One of major findings in this study is to elucidate the associations of factors affecting absenteeism. Application to similarly structured social data improves the understanding of the complex interplay between social factors and absenteeism that are important for people analytics which can help organizations resolve management difficulties.https://doi.org/10.1177/21582440251336019
spellingShingle Donggeun Kim
Jai Woo Lee
Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
SAGE Open
title Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
title_full Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
title_fullStr Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
title_full_unstemmed Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
title_short Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
title_sort predicting absenteeism at workplace using machine learning and network analysis
url https://doi.org/10.1177/21582440251336019
work_keys_str_mv AT donggeunkim predictingabsenteeismatworkplaceusingmachinelearningandnetworkanalysis
AT jaiwoolee predictingabsenteeismatworkplaceusingmachinelearningandnetworkanalysis