Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction
In a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10721458/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850198662131482624 |
|---|---|
| author | Georgios Kalitsios Dimitrios Konstantinidis Petros Daras Kosmas Dimitropoulos |
| author_facet | Georgios Kalitsios Dimitrios Konstantinidis Petros Daras Kosmas Dimitropoulos |
| author_sort | Georgios Kalitsios |
| collection | DOAJ |
| description | In a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal way to merge the different views in order to estimate with great fidelity the 3D shape of the object. This work aims to address this issue by proposing a novel approach to compute information-rich inter-view features by combining image tokens with similar distinctive characteristics among the different views dynamically. This is achieved by leveraging the self-attention mechanism of a Transformer, enhanced with a multi-manifold attention module, to estimate the importance of image tokens on-the-fly and re-arrange them among the different views in a way that improves the viewpoint merging procedure and the 3D reconstruction results. Experiments on ShapeNet and Pix3D validate the ability of the proposed method to achieve state-of-the-art performance in both multi-view and single-view 3D object reconstruction. |
| format | Article |
| id | doaj-art-7eee5ee6b159497b852dff2994fc7d05 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-7eee5ee6b159497b852dff2994fc7d052025-08-20T02:12:49ZengIEEEIEEE Access2169-35362024-01-011216069016069910.1109/ACCESS.2024.348343410721458Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object ReconstructionGeorgios Kalitsios0https://orcid.org/0009-0000-4647-6459Dimitrios Konstantinidis1https://orcid.org/0000-0002-7391-6875Petros Daras2https://orcid.org/0000-0003-3814-6710Kosmas Dimitropoulos3https://orcid.org/0000-0003-1584-7047Information Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceInformation Technologies Institute, Centre for Research and Technology Hellas (CERTH), Thessaloniki, GreeceIn a multi-view 3D reconstruction problem, the task is to infer the 3D shape of an object from various images taken from different viewpoints. Transformer-based networks have demonstrated their ability to achieve high performance in such problems, but they face challenges in identifying the optimal way to merge the different views in order to estimate with great fidelity the 3D shape of the object. This work aims to address this issue by proposing a novel approach to compute information-rich inter-view features by combining image tokens with similar distinctive characteristics among the different views dynamically. This is achieved by leveraging the self-attention mechanism of a Transformer, enhanced with a multi-manifold attention module, to estimate the importance of image tokens on-the-fly and re-arrange them among the different views in a way that improves the viewpoint merging procedure and the 3D reconstruction results. Experiments on ShapeNet and Pix3D validate the ability of the proposed method to achieve state-of-the-art performance in both multi-view and single-view 3D object reconstruction.https://ieeexplore.ieee.org/document/10721458/Dynamic groupingmulti-manifold attentionmulti-view 3D reconstructiontransformervoxel representation |
| spellingShingle | Georgios Kalitsios Dimitrios Konstantinidis Petros Daras Kosmas Dimitropoulos Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction IEEE Access Dynamic grouping multi-manifold attention multi-view 3D reconstruction transformer voxel representation |
| title | Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction |
| title_full | Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction |
| title_fullStr | Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction |
| title_full_unstemmed | Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction |
| title_short | Dynamic Grouping With Multi-Manifold Attention for Multi-View 3D Object Reconstruction |
| title_sort | dynamic grouping with multi manifold attention for multi view 3d object reconstruction |
| topic | Dynamic grouping multi-manifold attention multi-view 3D reconstruction transformer voxel representation |
| url | https://ieeexplore.ieee.org/document/10721458/ |
| work_keys_str_mv | AT georgioskalitsios dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction AT dimitrioskonstantinidis dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction AT petrosdaras dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction AT kosmasdimitropoulos dynamicgroupingwithmultimanifoldattentionformultiview3dobjectreconstruction |