Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02423
Cited By
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
4 February 2024
Yifu Yuan
Jianye Hao
Yi-An Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback"
17 / 17 papers shown
Title
A Survey on Large Language Model based Human-Agent Systems
Henry Peng Zou
Wei-Chieh Huang
Yaozu Wu
Yankai Chen
Chunyu Miao
...
Y. Li
Yuwei Cao
Dongyuan Li
Renhe Jiang
Philip S. Yu
LLMAG
LM&Ro
LM&MA
79
0
0
01 May 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
43
2
0
12 Apr 2025
Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning
Younghwan Lee
T. Luu
Donghoon Lee
Chang-Dong Yoo
3DV
VLM
OffRL
41
0
0
03 Apr 2025
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models
Jinyi Liu
Yan Zheng
Rong Cheng
Qiyu Wu
Wei Guo
...
Hebin Liang
Yifu Yuan
Hangyu Mao
Fuzheng Zhang
Jianye Hao
LRM
AI4CE
59
1
0
20 Mar 2025
DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning
Teng Pang
Bingzheng Wang
Guoqiang Wu
Yilong Yin
OffRL
68
0
0
03 Mar 2025
CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang
Zhengran Ji
Boyuan Chen
42
3
0
03 Jan 2025
LEASE: Offline Preference-based Reinforcement Learning with High Sample Efficiency
Xiao-Yin Liu
Guotao Li
Xiao-Hu Zhou
Z. Hou
OffRL
39
0
0
31 Dec 2024
Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards
Zhaohui Jiang
Xuening Feng
Paul Weng
Yifei Zhu
Yan Song
Tianze Zhou
Yujing Hu
Tangjie Lv
Changjie Fan
41
0
0
08 Oct 2024
On the Effect of Robot Errors on Human Teaching Dynamics
Jindan Huang
Isaac S. Sheidlower
Reuben M. Aronson
E. Short
28
0
0
15 Sep 2024
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Zhao Shan
Chenyou Fan
Shuang Qiu
Jiyuan Shi
Chenjia Bai
33
4
0
09 Sep 2024
Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey
Ruiqi Zhang
Jing Hou
Florian Walter
Shangding Gu
Jiayi Guan
Florian Röhrbein
Yali Du
Panpan Cai
G. Chen
Alois Knoll
44
12
0
19 Aug 2024
Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL
Baiting Zhu
Meihua Dang
Aditya Grover
OffRL
66
23
0
30 Apr 2023
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences
L. Guan
Karthik Valmeekam
Subbarao Kambhampati
49
8
0
28 Oct 2022
CORL: Research-oriented Deep Offline Reinforcement Learning Library
Denis Tarasov
Alexander Nikulin
Dmitry Akimov
Vladislav Kurenkov
Sergey Kolesnikov
OffRL
46
78
0
13 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
311
11,915
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
838
0
12 Oct 2021
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving
Ming Zhou
Jun-Jie Luo
Julian Villela
Yaodong Yang
David Rusu
...
H. Ammar
Hongbo Zhang
Wulong Liu
Jianye Hao
Jun Wang
134
193
0
19 Oct 2020
1