Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction

In a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal...

Full description

Saved in:
Bibliographic Details
Main Authors: Georgios Kalitsios, Dimitrios Konstantinidis, Petros Daras, Kosmas Dimitropoulos
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10721458/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850198662131482624
author Georgios Kalitsios
Dimitrios Konstantinidis
Petros Daras
Kosmas Dimitropoulos
author_facet Georgios Kalitsios
Dimitrios Konstantinidis
Petros Daras
Kosmas Dimitropoulos
author_sort Georgios Kalitsios
collection DOAJ
description In a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal way to merge the different views in order to estimate with great fidelity the 3D shape of the object. This work aims to address this issue by proposing a novel approach to compute information-rich inter-view features by combining image tokens with similar distinctive characteristics among the different views dynamically. This is achieved by leveraging the self-attention mechanism of a Transformer, enhanced with a multi-manifold attention module, to estimate the importance of image tokens on-the-fly and re-arrange them among the different views in a way that improves the viewpoint merging procedure and the 3D reconstruction results. Experiments on ShapeNet and Pix3D validate the ability of the proposed method to achieve state-of-the-art performance in both multi-view and single-view 3D object reconstruction.
format Article
id doaj-art-7eee5ee6b159497b852dff2994fc7d05
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-7eee5ee6b159497b852dff2994fc7d052025-08-20T02:12:49ZengIEEEIEEE Access2169-35362024-01-011216069016069910.1109/ACCESS.2024.348343410721458Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object ReconstructionGeorgios Kalitsios0https://orcid.org/0009-0000-4647-6459Dimitrios Konstantinidis1https://orcid.org/0000-0002-7391-6875Petros Daras2https://orcid.org/0000-0003-3814-6710Kosmas Dimitropoulos3https://orcid.org/0000-0003-1584-7047Information Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceIn a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal way to merge the different views in order to estimate with great fidelity the 3D shape of the object. This work aims to address this issue by proposing a novel approach to compute information-rich inter-view features by combining image tokens with similar distinctive characteristics among the different views dynamically. This is achieved by leveraging the self-attention mechanism of a Transformer, enhanced with a multi-manifold attention module, to estimate the importance of image tokens on-the-fly and re-arrange them among the different views in a way that improves the viewpoint merging procedure and the 3D reconstruction results. Experiments on ShapeNet and Pix3D validate the ability of the proposed method to achieve state-of-the-art performance in both multi-view and single-view 3D object reconstruction.https://ieeexplore.ieee.org/document/10721458/Dynamic groupingmulti-manifold attentionmulti-view 3D reconstructiontransformervoxel representation
spellingShingle Georgios Kalitsios
Dimitrios Konstantinidis
Petros Daras
Kosmas Dimitropoulos
Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
IEEE Access
Dynamic grouping
multi-manifold attention
multi-view 3D reconstruction
transformer
voxel representation
title Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
title_full Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
title_fullStr Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
title_full_unstemmed Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
title_short Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
title_sort dynamic grouping with multi manifold attention for multi view 3d object reconstruction
topic Dynamic grouping
multi-manifold attention
multi-view 3D reconstruction
transformer
voxel representation
url https://ieeexplore.ieee.org/document/10721458/
work_keys_str_mv AT georgioskalitsios dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction
AT dimitrioskonstantinidis dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction
AT petrosdaras dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction
AT kosmasdimitropoulos dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction