Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.02865
Cited By
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
5 June 2023
Tianying Ji
Yuping Luo
Gang Hua
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
OffRL
OnRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic"
10 / 10 papers shown
Title
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Michal Nauman
M. Ostaszewski
Krzysztof Jankowski
Piotr Milo's
Marek Cygan
OffRL
47
16
0
25 May 2024
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
Michal Nauman
Michal Bortkiewicz
Piotr Milo's
Tomasz Trzciñski
M. Ostaszewski
Marek Cygan
OffRL
35
17
0
01 Mar 2024
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
Tianying Ji
Yongyuan Liang
Yan Zeng
Yu-Juan Luo
Guowei Xu
Jiawei Guo
Ruijie Zheng
Furong Huang
Gang Hua
Huazhe Xu
CML
50
11
0
22 Feb 2024
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Haoyi Niu
Tianying Ji
Bingqi Liu
Haocheng Zhao
Xiangyu Zhu
Jianying Zheng
Pengfei Huang
Guyue Zhou
Jianming Hu
Xianyuan Zhan
OffRL
OnRL
AI4CE
29
7
0
22 Sep 2023
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
Raj Ghugare
Homanga Bharadhwaj
Benjamin Eysenbach
Sergey Levine
Ruslan Salakhutdinov
OffRL
48
25
0
18 Sep 2022
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping
Hao Sun
Lei Han
Rui Yang
Xiaoteng Ma
Jian Guo
Bolei Zhou
OffRL
OnRL
38
10
0
15 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
369
12,081
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
848
0
12 Oct 2021
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Edoardo Cetin
Oya Celiktutan
OffRL
42
17
0
07 Oct 2021
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
Arsenii Kuznetsov
Pavel Shvechikov
Alexander Grishin
Dmitry Vetrov
136
187
0
08 May 2020
1