VLA-Grasp: a vision-language-action modeling with cross-modality fusion for task-oriented grasping

Abstract Task-oriented grasping (TOG) aims to predict the appropriate pose for grasping based on a specific task. While recent approaches have incorporated semantic knowledge into TOG models to enable robots to understand linguistic commands, they lack the ability to leverage relevant information fr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jianwei Zhu, Xueying Sun, Qiang Zhang, Mingmin Liu
Format:	Article
Language:	English
Published:	Springer 2025-05-01
Series:	Complex & Intelligent Systems
Subjects:	Task-oriented grasping Multimodal fusion Vision-language-action Cross-attention
Online Access:	https://doi.org/10.1007/s40747-025-01893-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://doi.org/10.1007/s40747-025-01893-x

VLA-Grasp: a vision-language-action modeling with cross-modality fusion for task-oriented grasping

Internet

Similar Items