Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.00539
Cited By
Distributionally Robust Reinforcement Learning with Human Feedback
1 March 2025
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Distributionally Robust Reinforcement Learning with Human Feedback"
15 / 15 papers shown
Title
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Chenyu Wang
Masatoshi Uehara
Yichun He
Amy Wang
Tommaso Biancalani
Avantika Lal
Tommi Jaakkola
Sergey Levine
Hanchen Wang
Aviv Regev
84
15
0
17 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
174
27
0
01 Oct 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son
William Bankes
Sayak Ray Chowdhury
Brooks Paige
Ilija Bogunovic
78
4
0
26 Jul 2024
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
124
32
0
29 May 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
140
252
0
20 Mar 2024
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika
Debmalya Mandal
Parameswaran Kamalaruban
Georgios Tzannetos
Goran Radanović
Adish Singla
44
14
0
04 Mar 2024
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury
Anush Kini
Nagarajan Natarajan
69
68
0
01 Mar 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
72
11
0
09 Feb 2024
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLa
ALM
65
133
0
04 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
335
4,298
0
09 Jun 2023
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Daniel Shin
Anca Dragan
Daniel S. Brown
OffRL
68
56
0
03 Jan 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
871
12,916
0
04 Mar 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
779
0
01 Dec 2021
Large-Scale Methods for Distributionally Robust Optimization
Daniel Levy
Y. Carmon
John C. Duchi
Aaron Sidford
73
217
0
12 Oct 2020
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
487
19,019
0
20 Jul 2017
1