78
0

HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction

Main:5 Pages
5 Figures
Bibliography:1 Pages
1 Tables
Abstract

This paper introduces HapticVLM, a novel multimodal system that integrates vision-language reasoning with deep convolutional networks to enable real-time haptic feedback. HapticVLM leverages a ConvNeXt-based material recognition module to generate robust visual embeddings for accurate identification of object materials, while a state-of-the-art Vision-Language Model (Qwen2-VL-2B-Instruct) infers ambient temperature from environmental cues. The system synthesizes tactile sensations by delivering vibrotactile feedback through speakers and thermal cues via a Peltier module, thereby bridging the gap between visual perception and tactile experience. Experimental evaluations demonstrate an average recognition accuracy of 84.67% across five distinct auditory-tactile patterns and a temperature estimation accuracy of 86.7% based on a tolerance-based evaluation method with an 8°C margin of error across 15 scenarios. Although promising, the current study is limited by the use of a small set of prominent patterns and a modest participant pool. Future work will focus on expanding the range of tactile patterns and increasing user studies to further refine and validate the system's performance. Overall, HapticVLM presents a significant step toward context-aware, multimodal haptic interaction with potential applications in virtual reality, and assistive technologies.

View on arXiv
@article{khan2025_2505.02569,
  title={ HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction },
  author={ Muhammad Haris Khan and Miguel Altamirano Cabrera and Dmitrii Iarchuk and Yara Mahmoud and Daria Trinitatova and Issatay Tokmurziyev and Dzmitry Tsetserukou },
  journal={arXiv preprint arXiv:2505.02569},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.