VA-learning as a more efficient alternative to Q-learning

29 May 2023

Papers citing "VA-learning as a more efficient alternative to Q-learning"

4 / 4 papers shown

Title
Soft Policy Optimization: Online Off-Policy RL for Sequence Models Taco Cohen David W. Zhang Kunhao Zheng Yunhao Tang Rémi Munos Gabriel Synnaeve OffRL 83 0 0 07 Mar 2025
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning Harley Wiltzer Marc G. Bellemare D. Meger Patrick Shafto Yash Jhaveri 34 1 0 14 Oct 2024
Value Augmented Sampling for Language Model Alignment and Personalization Seungwook Han Idan Shenfeld Akash Srivastava Yoon Kim Pulkit Agrawal OffRL 36 23 0 10 May 2024
Generative Flow Networks as Entropy-Regularized RL D. Tiapkin Nikita Morozov Alexey Naumov Dmitry Vetrov 48 28 0 19 Oct 2023