A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities

Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared,...

Full description

Saved in:
Bibliographic Details
Main Authors: Jungpil Shin, Najmul Hassan, Abu Saleh Musa Miah, Satoshi Nishimura
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4028
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319869472833536
author Jungpil Shin
Najmul Hassan
Abu Saleh Musa Miah
Satoshi Nishimura
author_facet Jungpil Shin
Najmul Hassan
Abu Saleh Musa Miah
Satoshi Nishimura
author_sort Jungpil Shin
collection DOAJ
description Human Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This survey includes only peer-reviewed research papers published in English to ensure linguistic consistency and academic integrity. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2025, focusing on Machine Learning (ML) and Deep Learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human–object interactions, and activity detection. Our survey includes a detailed dataset description for each modality, as well as a summary of the latest HAR systems, accompanied by a mathematical derivation for evaluating the deep learning model for each modality, and it also provides comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.
format Article
id doaj-art-8c4f0f218e5d4d80978db03b27d98a70
institution Kabale University
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-8c4f0f218e5d4d80978db03b27d98a702025-08-20T03:50:17ZengMDPI AGSensors1424-82202025-06-012513402810.3390/s25134028A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data ModalitiesJungpil Shin0Najmul Hassan1Abu Saleh Musa Miah2Satoshi Nishimura3School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanHuman Activity Recognition (HAR) systems aim to understand human behavior and assign a label to each action, attracting significant attention in computer vision due to their wide range of applications. HAR can leverage various data modalities, such as RGB images and video, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, and radar signals. Each modality provides unique and complementary information suited to different application scenarios. Consequently, numerous studies have investigated diverse approaches for HAR using these modalities. This survey includes only peer-reviewed research papers published in English to ensure linguistic consistency and academic integrity. This paper presents a comprehensive survey of the latest advancements in HAR from 2014 to 2025, focusing on Machine Learning (ML) and Deep Learning (DL) approaches categorized by input data modalities. We review both single-modality and multi-modality techniques, highlighting fusion-based and co-learning frameworks. Additionally, we cover advancements in hand-crafted action features, methods for recognizing human–object interactions, and activity detection. Our survey includes a detailed dataset description for each modality, as well as a summary of the latest HAR systems, accompanied by a mathematical derivation for evaluating the deep learning model for each modality, and it also provides comparative results on benchmark datasets. Finally, we provide insightful observations and propose effective future research directions in HAR.https://www.mdpi.com/1424-8220/25/13/4028human activity recognition (HAR)diverse modalitydeep learning (DL)machine learning (ML)vision and sensor based HARclassification
spellingShingle Jungpil Shin
Najmul Hassan
Abu Saleh Musa Miah
Satoshi Nishimura
A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
Sensors
human activity recognition (HAR)
diverse modality
deep learning (DL)
machine learning (ML)
vision and sensor based HAR
classification
title A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
title_full A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
title_fullStr A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
title_full_unstemmed A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
title_short A Comprehensive Methodological Survey of Human Activity Recognition Across Diverse Data Modalities
title_sort comprehensive methodological survey of human activity recognition across diverse data modalities
topic human activity recognition (HAR)
diverse modality
deep learning (DL)
machine learning (ML)
vision and sensor based HAR
classification
url https://www.mdpi.com/1424-8220/25/13/4028
work_keys_str_mv AT jungpilshin acomprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT najmulhassan acomprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT abusalehmusamiah acomprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT satoshinishimura acomprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT jungpilshin comprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT najmulhassan comprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT abusalehmusamiah comprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities
AT satoshinishimura comprehensivemethodologicalsurveyofhumanactivityrecognitionacrossdiversedatamodalities