Effective human–object interaction recognition for edge devices in intelligent space
To enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2024-12-01
|
| Series: | SICE Journal of Control, Measurement, and System Integration |
| Subjects: | |
| Online Access: | http://dx.doi.org/10.1080/18824889.2023.2292353 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850106443263377408 |
|---|---|
| author | Haruhiro Ozaki Dinh Tuan Tran Joo-Ho Lee |
| author_facet | Haruhiro Ozaki Dinh Tuan Tran Joo-Ho Lee |
| author_sort | Haruhiro Ozaki |
| collection | DOAJ |
| description | To enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the need for advanced machine-learning systems that provide spatial analysis and services, the recognition model should be robust to various changes, have high extensibility, and provide sufficient recognition speed even with minimal computational overhead. Therefore, we propose a novel method that combines the skeletal method with object detection to accurately predict a set of $ \langle $ human, verb, object $ \rangle $ triplets in a video frame considering the robustness, extensibility, and lightweight of the model. Training a model with similar perceptual elements to those of humans produces sufficient accuracy for advanced social systems, even with only a small training dataset. The proposed model is trained using only the coordinates of the object and human landmarks, making it robust to various situations and lightweight compared with deep-learning methods. In the experiment, a scenario in which a human is working on a desk is simulated and an algorithm is trained on object-specific interactions. The accuracy of the proposed model was evaluated using various types of datasets. |
| format | Article |
| id | doaj-art-ede424517c3243ac8869b5d653634c79 |
| institution | OA Journals |
| issn | 1884-9970 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | SICE Journal of Control, Measurement, and System Integration |
| spelling | doaj-art-ede424517c3243ac8869b5d653634c792025-08-20T02:38:49ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702024-12-011711910.1080/18824889.2023.22923532292353Effective human–object interaction recognition for edge devices in intelligent spaceHaruhiro Ozaki0Dinh Tuan Tran1Joo-Ho Lee2Ritsumeikan UniversityRitsumeikan UniversityRitsumeikan UniversityTo enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the need for advanced machine-learning systems that provide spatial analysis and services, the recognition model should be robust to various changes, have high extensibility, and provide sufficient recognition speed even with minimal computational overhead. Therefore, we propose a novel method that combines the skeletal method with object detection to accurately predict a set of $ \langle $ human, verb, object $ \rangle $ triplets in a video frame considering the robustness, extensibility, and lightweight of the model. Training a model with similar perceptual elements to those of humans produces sufficient accuracy for advanced social systems, even with only a small training dataset. The proposed model is trained using only the coordinates of the object and human landmarks, making it robust to various situations and lightweight compared with deep-learning methods. In the experiment, a scenario in which a human is working on a desk is simulated and an algorithm is trained on object-specific interactions. The accuracy of the proposed model was evaluated using various types of datasets.http://dx.doi.org/10.1080/18824889.2023.2292353human–object interactioncomputer visionmachine learningedge computingsystem integrationintelligent space |
| spellingShingle | Haruhiro Ozaki Dinh Tuan Tran Joo-Ho Lee Effective human–object interaction recognition for edge devices in intelligent space SICE Journal of Control, Measurement, and System Integration human–object interaction computer vision machine learning edge computing system integration intelligent space |
| title | Effective human–object interaction recognition for edge devices in intelligent space |
| title_full | Effective human–object interaction recognition for edge devices in intelligent space |
| title_fullStr | Effective human–object interaction recognition for edge devices in intelligent space |
| title_full_unstemmed | Effective human–object interaction recognition for edge devices in intelligent space |
| title_short | Effective human–object interaction recognition for edge devices in intelligent space |
| title_sort | effective human object interaction recognition for edge devices in intelligent space |
| topic | human–object interaction computer vision machine learning edge computing system integration intelligent space |
| url | http://dx.doi.org/10.1080/18824889.2023.2292353 |
| work_keys_str_mv | AT haruhiroozaki effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace AT dinhtuantran effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace AT jooholee effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace |