ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.15953
10
0

ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation

19 June 2025
Liang Heng
Haoran Geng
Kaifeng Zhang
Pieter Abbeel
Jitendra Malik
ArXiv (abs)PDFHTML
Main:9 Pages
19 Figures
Bibliography:4 Pages
15 Tables
Appendix:8 Pages
Abstract

Dexterous manipulation is a cornerstone capability for robotic systems aiming to interact with the physical world in a human-like manner. Although vision-based methods have advanced rapidly, tactile sensing remains crucial for fine-grained control, particularly in unstructured or visually occluded settings. We present ViTacFormer, a representation-learning approach that couples a cross-attention encoder to fuse high-resolution vision and touch with an autoregressive tactile prediction head that anticipates future contact signals. Building on this architecture, we devise an easy-to-challenging curriculum that steadily refines the visual-tactile latent space, boosting both accuracy and robustness. The learned cross-modal representation drives imitation learning for multi-fingered hands, enabling precise and adaptive manipulation. Across a suite of challenging real-world benchmarks, our method achieves approximately 50% higher success rates than prior state-of-the-art systems. To our knowledge, it is also the first to autonomously complete long-horizon dexterous manipulation tasks that demand highly precise control with an anthropomorphic hand, successfully executing up to 11 sequential stages and sustaining continuous operation for 2.5 minutes.

View on arXiv
@article{heng2025_2506.15953,
  title={ ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation },
  author={ Liang Heng and Haoran Geng and Kaifeng Zhang and Pieter Abbeel and Jitendra Malik },
  journal={arXiv preprint arXiv:2506.15953},
  year={ 2025 }
}
Comments on this paper