MAPE-ViT: multimodal scene understanding with novel wavelet-augmented Vision Transformer

This article introduces Multimodal Adaptive Patch Embedding with Vision Transformer (MAPE-ViT), a novel approach for RGB-D scene classification that effectively addresses fundamental challenges of sensor misalignment, depth noise, and object boundary preservation. Our framework integrates maximally...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Waqas Ahmed, Touseef Sadiq, Hameedur Rahman, Sulaiman Abdullah Alateyah, Mohammed Alnusayri, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi
Format: Article
Language:English
Published: PeerJ Inc. 2025-05-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2796.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!