Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion
Human fall detection is a significant healthcare concern, particularly among the elderly, due to its links to muscle weakness, cardiovascular issues, and locomotive syndrome. Accurate fall detection is crucial for timely intervention and injury prevention, which has led many researchers to work on d...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Future Internet |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-5903/17/4/173 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850180575945555968 |
|---|---|
| author | Jungpil Shin Abu Saleh Musa Miah Rei Egawa Najmul Hassan Koki Hirooka Yoichi Tomioka |
| author_facet | Jungpil Shin Abu Saleh Musa Miah Rei Egawa Najmul Hassan Koki Hirooka Yoichi Tomioka |
| author_sort | Jungpil Shin |
| collection | DOAJ |
| description | Human fall detection is a significant healthcare concern, particularly among the elderly, due to its links to muscle weakness, cardiovascular issues, and locomotive syndrome. Accurate fall detection is crucial for timely intervention and injury prevention, which has led many researchers to work on developing effective detection systems. However, existing unimodal systems that rely solely on skeleton or sensor data face challenges such as poor robustness, computational inefficiency, and sensitivity to environmental conditions. While some multimodal approaches have been proposed, they often struggle to capture long-range dependencies effectively. In order to address these challenges, we propose a multimodal fall detection framework that integrates skeleton and sensor data. The system uses a Graph-based Spatial-Temporal Convolutional and Attention Neural Network (GSTCAN) to capture spatial and temporal relationships from skeleton and motion data information in stream-1, while a Bi-LSTM with Channel Attention (CA) processes sensor data in stream-2, extracting both spatial and temporal features. The GSTCAN model uses AlphaPose for skeleton extraction, calculates motion between consecutive frames, and applies a graph convolutional network (GCN) with a CA mechanism to focus on relevant features while suppressing noise. In parallel, the Bi-LSTM with CA processes inertial signals, with Bi-LSTM capturing long-range temporal dependencies and CA refining feature representations. The features from both branches are fused and passed through a fully connected layer for classification, providing a comprehensive understanding of human motion. The proposed system was evaluated on the Fall Up and UR Fall datasets, achieving a classification accuracy of 99.09% and 99.32%, respectively, surpassing existing methods. This robust and efficient system demonstrates strong potential for accurate fall detection and continuous healthcare monitoring. |
| format | Article |
| id | doaj-art-51e1c5ddc6804f4eb86a9c7713ae4a3b |
| institution | OA Journals |
| issn | 1999-5903 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Future Internet |
| spelling | doaj-art-51e1c5ddc6804f4eb86a9c7713ae4a3b2025-08-20T02:18:05ZengMDPI AGFuture Internet1999-59032025-04-0117417310.3390/fi17040173Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature FusionJungpil Shin0Abu Saleh Musa Miah1Rei Egawa2Najmul Hassan3Koki Hirooka4Yoichi Tomioka5School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanSchool of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, JapanHuman fall detection is a significant healthcare concern, particularly among the elderly, due to its links to muscle weakness, cardiovascular issues, and locomotive syndrome. Accurate fall detection is crucial for timely intervention and injury prevention, which has led many researchers to work on developing effective detection systems. However, existing unimodal systems that rely solely on skeleton or sensor data face challenges such as poor robustness, computational inefficiency, and sensitivity to environmental conditions. While some multimodal approaches have been proposed, they often struggle to capture long-range dependencies effectively. In order to address these challenges, we propose a multimodal fall detection framework that integrates skeleton and sensor data. The system uses a Graph-based Spatial-Temporal Convolutional and Attention Neural Network (GSTCAN) to capture spatial and temporal relationships from skeleton and motion data information in stream-1, while a Bi-LSTM with Channel Attention (CA) processes sensor data in stream-2, extracting both spatial and temporal features. The GSTCAN model uses AlphaPose for skeleton extraction, calculates motion between consecutive frames, and applies a graph convolutional network (GCN) with a CA mechanism to focus on relevant features while suppressing noise. In parallel, the Bi-LSTM with CA processes inertial signals, with Bi-LSTM capturing long-range temporal dependencies and CA refining feature representations. The features from both branches are fused and passed through a fully connected layer for classification, providing a comprehensive understanding of human motion. The proposed system was evaluated on the Fall Up and UR Fall datasets, achieving a classification accuracy of 99.09% and 99.32%, respectively, surpassing existing methods. This robust and efficient system demonstrates strong potential for accurate fall detection and continuous healthcare monitoring.https://www.mdpi.com/1999-5903/17/4/173ageing peopleAlphaPosebody pose detectionchannel attentiongraph convolutional network (GCN)human fall detection |
| spellingShingle | Jungpil Shin Abu Saleh Musa Miah Rei Egawa Najmul Hassan Koki Hirooka Yoichi Tomioka Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion Future Internet ageing people AlphaPose body pose detection channel attention graph convolutional network (GCN) human fall detection |
| title | Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion |
| title_full | Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion |
| title_fullStr | Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion |
| title_full_unstemmed | Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion |
| title_short | Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion |
| title_sort | multimodal fall detection using spatial temporal attention and bi lstm based feature fusion |
| topic | ageing people AlphaPose body pose detection channel attention graph convolutional network (GCN) human fall detection |
| url | https://www.mdpi.com/1999-5903/17/4/173 |
| work_keys_str_mv | AT jungpilshin multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion AT abusalehmusamiah multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion AT reiegawa multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion AT najmulhassan multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion AT kokihirooka multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion AT yoichitomioka multimodalfalldetectionusingspatialtemporalattentionandbilstmbasedfeaturefusion |