16
0

Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control

Abstract

Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.

View on arXiv
@article{park2025_2505.15304,
  title={ Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control },
  author={ Seongmin Park and Hyungmin Kim and Sangwoo kim and Wonseok Jeon and Juyoung Yang and Byeongwook Jeon and Yoonseon Oh and Jungwook Choi },
  journal={arXiv preprint arXiv:2505.15304},
  year={ 2025 }
}
Comments on this paper