MAPE-ViT: multimodal scene understanding with novel wavelet-augmented Vision Transformer

This article introduces Multimodal Adaptive Patch Embedding with Vision Transformer (MAPE-ViT), a novel approach for RGB-D scene classification that effectively addresses fundamental challenges of sensor misalignment, depth noise, and object boundary preservation. Our framework integrates maximally...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Waqas Ahmed, Touseef Sadiq, Hameedur Rahman, Sulaiman Abdullah Alateyah, Mohammed Alnusayri, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi
Format:	Article
Language:	English
Published:	PeerJ Inc. 2025-05-01
Series:	PeerJ Computer Science
Subjects:	Scene classification Patterns recognition Multimodal Vision Transformer Deep learning
Online Access:	https://peerj.com/articles/cs-2796.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://peerj.com/articles/cs-2796.pdf

MAPE-ViT: multimodal scene understanding with novel wavelet-augmented Vision Transformer

Internet

Similar Items