Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

9 June 2025

Main:8 Pages

4 Figures

Bibliography:4 Pages

12 Tables

Appendix:3 Pages

Abstract

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.

View on arXiv

@article{duan2025_2506.07822,
  title={ Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation },
  author={ Xintong Duan and Yutong He and Fahim Tajwar and Ruslan Salakhutdinov and J. Zico Kolter and Jeff Schneider },
  journal={arXiv preprint arXiv:2506.07822},
  year={ 2025 }
}

Comments on this paper