ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging

Accurately estimating 3D human pose and body shape from a single monocular image remains challenging, especially under poor lighting or occlusions. Traditional RGB-based methods struggle in such conditions, whereas single-pixel imaging (SPI) in the Near-Infrared (NIR) spectrum offers a robust altern...

Full description

Saved in:
Bibliographic Details
Main Authors: Carlos Osorio Quero, Daniel Durini, Jose Martinez-Carranza
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/6138
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849330853594791936
author Carlos Osorio Quero
Daniel Durini
Jose Martinez-Carranza
author_facet Carlos Osorio Quero
Daniel Durini
Jose Martinez-Carranza
author_sort Carlos Osorio Quero
collection DOAJ
description Accurately estimating 3D human pose and body shape from a single monocular image remains challenging, especially under poor lighting or occlusions. Traditional RGB-based methods struggle in such conditions, whereas single-pixel imaging (SPI) in the Near-Infrared (NIR) spectrum offers a robust alternative. NIR penetrates clothing and adapts to illumination changes, enhancing body shape and pose estimation. This work explores an SPI camera (850–1550 nm) with Time-of-Flight (TOF) technology for human detection in low-light conditions. SPI-derived point clouds are processed using a Vision Transformer (ViT) to align poses with a predefined SMPL-X model. A self-supervised PointNet++ network estimates global rotation, translation, body shape, and pose, enabling precise 3D human mesh reconstruction. Laboratory experiments simulating night-time conditions validate NIR-SPI’s potential for real-world applications, including human detection in rescue missions.
format Article
id doaj-art-dead3da1a96544ef8d186f0567567896
institution Kabale University
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-dead3da1a96544ef8d186f05675678962025-08-20T03:46:48ZengMDPI AGApplied Sciences2076-34172025-05-011511613810.3390/app15116138ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel ImagingCarlos Osorio Quero0Daniel Durini1Jose Martinez-Carranza2INAOE Computer Science, Tonantzintla, Puebla 72840, MexicoINAOE Electronics Department, Tonantzintla, Puebla 72840, MexicoINAOE Computer Science, Tonantzintla, Puebla 72840, MexicoAccurately estimating 3D human pose and body shape from a single monocular image remains challenging, especially under poor lighting or occlusions. Traditional RGB-based methods struggle in such conditions, whereas single-pixel imaging (SPI) in the Near-Infrared (NIR) spectrum offers a robust alternative. NIR penetrates clothing and adapts to illumination changes, enhancing body shape and pose estimation. This work explores an SPI camera (850–1550 nm) with Time-of-Flight (TOF) technology for human detection in low-light conditions. SPI-derived point clouds are processed using a Vision Transformer (ViT) to align poses with a predefined SMPL-X model. A self-supervised PointNet++ network estimates global rotation, translation, body shape, and pose, enabling precise 3D human mesh reconstruction. Laboratory experiments simulating night-time conditions validate NIR-SPI’s potential for real-world applications, including human detection in rescue missions.https://www.mdpi.com/2076-3417/15/11/6138single-pixel imaging (SPI)self-supervisedSMPL-X modeldepth perceptionvision transformers (ViT)3D human model
spellingShingle Carlos Osorio Quero
Daniel Durini
Jose Martinez-Carranza
ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
Applied Sciences
single-pixel imaging (SPI)
self-supervised
SMPL-X model
depth perception
vision transformers (ViT)
3D human model
title ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
title_full ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
title_fullStr ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
title_full_unstemmed ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
title_short ViT-Based Classification and Self-Supervised 3D Human Mesh Generation from NIR Single-Pixel Imaging
title_sort vit based classification and self supervised 3d human mesh generation from nir single pixel imaging
topic single-pixel imaging (SPI)
self-supervised
SMPL-X model
depth perception
vision transformers (ViT)
3D human model
url https://www.mdpi.com/2076-3417/15/11/6138
work_keys_str_mv AT carlososorioquero vitbasedclassificationandselfsupervised3dhumanmeshgenerationfromnirsinglepixelimaging
AT danieldurini vitbasedclassificationandselfsupervised3dhumanmeshgenerationfromnirsinglepixelimaging
AT josemartinezcarranza vitbasedclassificationandselfsupervised3dhumanmeshgenerationfromnirsinglepixelimaging