Effective human–object interaction recognition for edge devices in intelligent space

To enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the...

Full description

Saved in:
Bibliographic Details
Main Authors: Haruhiro Ozaki, Dinh Tuan Tran, Joo-Ho Lee
Format: Article
Language:English
Published: Taylor & Francis Group 2024-12-01
Series:SICE Journal of Control, Measurement, and System Integration
Subjects:
Online Access:http://dx.doi.org/10.1080/18824889.2023.2292353
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850106443263377408
author Haruhiro Ozaki
Dinh Tuan Tran
Joo-Ho Lee
author_facet Haruhiro Ozaki
Dinh Tuan Tran
Joo-Ho Lee
author_sort Haruhiro Ozaki
collection DOAJ
description To enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the need for advanced machine-learning systems that provide spatial analysis and services, the recognition model should be robust to various changes, have high extensibility, and provide sufficient recognition speed even with minimal computational overhead. Therefore, we propose a novel method that combines the skeletal method with object detection to accurately predict a set of $ \langle $ human, verb, object $ \rangle $ triplets in a video frame considering the robustness, extensibility, and lightweight of the model. Training a model with similar perceptual elements to those of humans produces sufficient accuracy for advanced social systems, even with only a small training dataset. The proposed model is trained using only the coordinates of the object and human landmarks, making it robust to various situations and lightweight compared with deep-learning methods. In the experiment, a scenario in which a human is working on a desk is simulated and an algorithm is trained on object-specific interactions. The accuracy of the proposed model was evaluated using various types of datasets.
format Article
id doaj-art-ede424517c3243ac8869b5d653634c79
institution OA Journals
issn 1884-9970
language English
publishDate 2024-12-01
publisher Taylor & Francis Group
record_format Article
series SICE Journal of Control, Measurement, and System Integration
spelling doaj-art-ede424517c3243ac8869b5d653634c792025-08-20T02:38:49ZengTaylor & Francis GroupSICE Journal of Control, Measurement, and System Integration1884-99702024-12-011711910.1080/18824889.2023.22923532292353Effective human–object interaction recognition for edge devices in intelligent spaceHaruhiro Ozaki0Dinh Tuan Tran1Joo-Ho Lee2Ritsumeikan UniversityRitsumeikan UniversityRitsumeikan UniversityTo enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the need for advanced machine-learning systems that provide spatial analysis and services, the recognition model should be robust to various changes, have high extensibility, and provide sufficient recognition speed even with minimal computational overhead. Therefore, we propose a novel method that combines the skeletal method with object detection to accurately predict a set of $ \langle $ human, verb, object $ \rangle $ triplets in a video frame considering the robustness, extensibility, and lightweight of the model. Training a model with similar perceptual elements to those of humans produces sufficient accuracy for advanced social systems, even with only a small training dataset. The proposed model is trained using only the coordinates of the object and human landmarks, making it robust to various situations and lightweight compared with deep-learning methods. In the experiment, a scenario in which a human is working on a desk is simulated and an algorithm is trained on object-specific interactions. The accuracy of the proposed model was evaluated using various types of datasets.http://dx.doi.org/10.1080/18824889.2023.2292353human–object interactioncomputer visionmachine learningedge computingsystem integrationintelligent space
spellingShingle Haruhiro Ozaki
Dinh Tuan Tran
Joo-Ho Lee
Effective human–object interaction recognition for edge devices in intelligent space
SICE Journal of Control, Measurement, and System Integration
human–object interaction
computer vision
machine learning
edge computing
system integration
intelligent space
title Effective human–object interaction recognition for edge devices in intelligent space
title_full Effective human–object interaction recognition for edge devices in intelligent space
title_fullStr Effective human–object interaction recognition for edge devices in intelligent space
title_full_unstemmed Effective human–object interaction recognition for edge devices in intelligent space
title_short Effective human–object interaction recognition for edge devices in intelligent space
title_sort effective human object interaction recognition for edge devices in intelligent space
topic human–object interaction
computer vision
machine learning
edge computing
system integration
intelligent space
url http://dx.doi.org/10.1080/18824889.2023.2292353
work_keys_str_mv AT haruhiroozaki effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace
AT dinhtuantran effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace
AT jooholee effectivehumanobjectinteractionrecognitionforedgedevicesinintelligentspace