Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18161
Cited By
VA-learning as a more efficient alternative to Q-learning
29 May 2023
Yunhao Tang
Rémi Munos
Mark Rowland
Michal Valko
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VA-learning as a more efficient alternative to Q-learning"
4 / 4 papers shown
Title
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Taco Cohen
David W. Zhang
Kunhao Zheng
Yunhao Tang
Rémi Munos
Gabriel Synnaeve
OffRL
83
0
0
07 Mar 2025
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
Harley Wiltzer
Marc G. Bellemare
D. Meger
Patrick Shafto
Yash Jhaveri
34
1
0
14 Oct 2024
Value Augmented Sampling for Language Model Alignment and Personalization
Seungwook Han
Idan Shenfeld
Akash Srivastava
Yoon Kim
Pulkit Agrawal
OffRL
36
23
0
10 May 2024
Generative Flow Networks as Entropy-Regularized RL
D. Tiapkin
Nikita Morozov
Alexey Naumov
Dmitry Vetrov
48
28
0
19 Oct 2023
1