ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13725
7
0

CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

16 June 2025
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Yuxin Huang
Han Zhao
Donglin Wang
Haoang Li
ArXiv (abs)PDFHTML
Main:10 Pages
8 Figures
Bibliography:3 Pages
9 Tables
Appendix:3 Pages
Abstract

In recent years, Vision-Language-Action (VLA) models have become a vital research direction in robotics due to their impressive multimodal understanding and generalization capabilities. Despite the progress, their practical deployment is severely constrained by inference speed bottlenecks, particularly in high-frequency and dexterous manipulation tasks. While recent studies have explored Jacobi decoding as a more efficient alternative to traditional autoregressive decoding, its practical benefits are marginal due to the lengthy iterations. To address it, we introduce consistency distillation training to predict multiple correct action tokens in each iteration, thereby achieving acceleration. Besides, we design mixed-label supervision to mitigate the error accumulation during distillation. Although distillation brings acceptable speedup, we identify that certain inefficient iterations remain a critical bottleneck. To tackle this, we propose an early-exit decoding strategy that moderately relaxes convergence conditions, which further improves average inference efficiency. Experimental results show that the proposed method achieves more than 4 times inference acceleration across different baselines while maintaining high task success rates in both simulated and real-world robot tasks. These experiments validate that our approach provides an efficient and general paradigm for accelerating multimodal decision-making in robotics. Our project page is available atthis https URL.

View on arXiv
@article{song2025_2506.13725,
  title={ CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding },
  author={ Wenxuan Song and Jiayi Chen and Pengxiang Ding and Yuxin Huang and Han Zhao and Donglin Wang and Haoang Li },
  journal={arXiv preprint arXiv:2506.13725},
  year={ 2025 }
}
Comments on this paper