Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.14111
Cited By
v1
v2 (latest)
Is RLHF More Difficult than Standard RL?
25 June 2023
Yuanhao Wang
Qinghua Liu
Chi Jin
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Is RLHF More Difficult than Standard RL?"
26 / 26 papers shown
Title
Online Iterative Self-Alignment for Radiology Report Generation
Ting Xiao
Lei Shi
Yang Zhang
HaoFeng Yang
Zhe Wang
Chenjia Bai
82
0
0
17 May 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
260
3
0
26 Feb 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
160
1
0
18 Feb 2025
Design Considerations in Offline Preference-based RL
Alekh Agarwal
Christoph Dann
T. V. Marinov
OffRL
102
1
0
08 Feb 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
225
6
0
07 Nov 2024
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
68
1
0
21 Oct 2024
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
H. Fernando
Han Shen
Parikshit Ram
Yi Zhou
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
CLL
149
4
0
20 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
134
7
0
06 Oct 2024
Beyond Numeric Rewards: In-Context Dueling Bandits with LLM Agents
Fanzeng Xia
Hao Liu
Yisong Yue
Tongxin Li
134
1
0
02 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
129
2
0
11 Jun 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
125
72
0
29 Apr 2024
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
112
285
0
23 Feb 2023
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
Josh Abramson
Arun Ahuja
Federico Carnevale
Petko Georgiev
Alex Goldin
...
Tamara von Glehn
Greg Wayne
Nathaniel Wong
Chen Yan
Rui Zhu
74
28
0
21 Nov 2022
Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
Qinghua Liu
Csaba Szepesvári
Chi Jin
100
21
0
02 Jun 2022
The Statistical Complexity of Interactive Decision Making
Dylan J. Foster
Sham Kakade
Jian Qian
Alexander Rakhlin
374
183
0
27 Dec 2021
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
88
90
0
08 Nov 2021
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNN
VLM
CLL
AI4CE
LRM
169
1,838
0
13 Dec 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
488
1,768
0
18 Sep 2019
Dueling Posterior Sampling for Preference-Based Reinforcement Learning
Ellen R. Novoseller
Yibing Wei
Yanan Sui
Yisong Yue
J. W. Burdick
84
64
0
04 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
109
560
0
11 Jul 2019
Preference-based Online Learning with Dueling Bandits: A Survey
Viktor Bengs
R. Busa-Fekete
Adil El Mesaoudi-Paul
Eyke Hüllermeier
106
114
0
30 Jul 2018
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
Garrett A. Warnell
Nicholas R. Waytowich
Vernon J. Lawhern
Peter Stone
72
272
0
28 Sep 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
Consistent Probabilistic Social Choice
F. Brandl
F. Brandt
Hans Georg Seedig
59
95
0
21 Feb 2015
Reducing Dueling Bandits to Cardinal Bandits
Nir Ailon
Thorsten Joachims
Zohar Karnin
171
140
0
14 May 2014
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
132
12,272
0
19 Dec 2013
1