ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15568
  4. Cited By
Robust Reinforcement Learning from Corrupted Human Feedback

Robust Reinforcement Learning from Corrupted Human Feedback

21 June 2024
Alexander Bukharin
Ilgee Hong
Haoming Jiang
Zichong Li
Qingru Zhang
Zixuan Zhang
Tuo Zhao
ArXivPDFHTML

Papers citing "Robust Reinforcement Learning from Corrupted Human Feedback"

12 / 12 papers shown
Title
RIME: Robust Preference-based Reinforcement Learning with Noisy
  Preferences
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Jie Cheng
Gang Xiong
Xingyuan Dai
Qinghai Miao
Yisheng Lv
Fei-Yue Wang
49
17
0
27 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
57
11
0
09 Feb 2024
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
83
580
0
18 Oct 2023
Robust Multi-Agent Reinforcement Learning via Adversarial
  Regularization: Theoretical Foundation and Stable Algorithms
Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
Alexander Bukharin
Yan Li
Yue Yu
Qingru Zhang
Zhehui Chen
Simiao Zuo
Chao Zhang
Songan Zhang
Tuo Zhao
OOD
AAML
37
18
0
16 Oct 2023
Reward Model Ensembles Help Mitigate Overoptimization
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLa
ALM
37
126
0
04 Oct 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
69
569
0
22 May 2023
B-Pref: Benchmarking Preference-Based Reinforcement Learning
B-Pref: Benchmarking Preference-Based Reinforcement Learning
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
OffRL
60
95
0
04 Nov 2021
A Universal Law of Robustness via Isoperimetry
A Universal Law of Robustness via Isoperimetry
Sébastien Bubeck
Mark Sellke
25
215
0
26 May 2021
Smooth Exploration for Robotic Reinforcement Learning
Smooth Exploration for Robotic Reinforcement Learning
Antonin Raffin
Jens Kober
F. Stulp
50
57
0
12 May 2020
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
183
18,685
0
20 Jul 2017
A General and Adaptive Robust Loss Function
A General and Adaptive Robust Loss Function
Jonathan T. Barron
OOD
DRL
64
536
0
11 Jan 2017
Adversarial Machine Learning at Scale
Adversarial Machine Learning at Scale
Alexey Kurakin
Ian Goodfellow
Samy Bengio
AAML
425
3,124
0
04 Nov 2016
1