ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.11270
  4. Cited By
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons

Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons

26 January 2023
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
    OffRL
ArXivPDFHTML

Papers citing "Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons"

50 / 147 papers shown
Title
Off-Policy Evaluation from Logged Human Feedback
Off-Policy Evaluation from Logged Human Feedback
Aniruddha Bhargava
Lalit P. Jain
B. Kveton
Ge Liu
Subhojyoti Mukherjee
OffRL
34
2
0
14 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in
  RLHF
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
61
2
0
12 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Aligning Large Language Models with Representation Editing: A Control
  Perspective
Aligning Large Language Models with Representation Editing: A Control Perspective
Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang
Kai Wang
Chao Zhang
32
22
0
10 Jun 2024
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with
  LLM-Enhanced RLHF
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun
Navid Salami Pargoo
Peter J. Jin
Jorge Ortiz
40
18
0
06 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline
  Alignment for Language Models
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
43
4
0
06 Jun 2024
Scalable Ensembling For Mitigating Reward Overoptimisation
Scalable Ensembling For Mitigating Reward Overoptimisation
Ahmed M. Ahmed
Rafael Rafailov
Stepan Sharkov
Xuechen Li
Oluwasanmi Koyejo
41
5
0
03 Jun 2024
Exploratory Preference Optimization: Harnessing Implicit
  Q*-Approximation for Sample-Efficient RLHF
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie
Dylan J. Foster
Akshay Krishnamurthy
Corby Rosset
Ahmed Hassan Awadallah
Alexander Rakhlin
49
33
0
31 May 2024
Robust Preference Optimization through Reward Model Distillation
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
81
22
0
29 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
48
43
0
26 May 2024
Axioms for AI Alignment from Human Feedback
Axioms for AI Alignment from Human Feedback
Luise Ge
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
Yevgeniy Vorobeychik
Junlin Wu
49
15
0
23 May 2024
A Unified Linear Programming Framework for Offline Reward Learning from
  Human Demonstrations and Feedback
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
OffRL
41
1
0
20 May 2024
Comparisons Are All You Need for Optimizing Smooth Functions
Comparisons Are All You Need for Optimizing Smooth Functions
Chenyi Zhang
Tongyang Li
AAML
37
1
0
19 May 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
72
1
0
18 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
47
0
0
05 May 2024
Learning Linear Utility Functions From Pairwise Comparison Queries
Learning Linear Utility Functions From Pairwise Comparison Queries
Luise Ge
Brendan Juba
Yevgeniy Vorobeychik
22
1
0
04 May 2024
Self-Play Preference Optimization for Language Model Alignment
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
33
113
0
01 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference
  Aggregation
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
39
28
0
30 Apr 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Guhao Feng
Guhao Feng
Li Zhao
Di He
Jiang Bian
Liwei Wang
Jiang Bian
Liwei Wang
55
57
0
29 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
43
31
0
25 Apr 2024
Optimal Design for Human Feedback
Optimal Design for Human Feedback
Subhojyoti Mukherjee
Anusha Lalitha
Kousha Kalantari
Aniket Deshmukh
Ge Liu
Yifei Ma
B. Kveton
44
0
0
22 Apr 2024
Mapping Social Choice Theory to RLHF
Mapping Social Choice Theory to RLHF
Jessica Dai
Eve Fleisig
32
11
0
19 Apr 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Hindsight PRIORs for Reward Learning from Human Preferences
Hindsight PRIORs for Reward Learning from Human Preferences
Mudit Verma
Katherine Metcalf
45
5
0
12 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
29
34
0
12 Apr 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
27
21
0
12 Apr 2024
An Overview of Diffusion Models: Applications, Guided Generation,
  Statistical Rates and Optimization
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
Minshuo Chen
Song Mei
Jianqing Fan
Mengdi Wang
VLM
MedIm
DiffM
37
48
0
11 Apr 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
42
9
0
09 Apr 2024
Prior Constraints-based Reward Model Training for Aligning Large
  Language Models
Prior Constraints-based Reward Model Training for Aligning Large Language Models
Hang Zhou
Chenglong Wang
Yimin Hu
Tong Xiao
Chunliang Zhang
Jingbo Zhu
ALM
43
2
0
01 Apr 2024
Diffusion Model for Data-Driven Black-Box Optimization
Diffusion Model for Data-Driven Black-Box Optimization
Zihao Li
Hui Yuan
Kaixuan Huang
Chengzhuo Ni
Yinyu Ye
Minshuo Chen
Mengdi Wang
DiffM
40
9
0
20 Mar 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human
  Alignment
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
43
11
0
17 Mar 2024
Deep Submodular Peripteral Networks
Deep Submodular Peripteral Networks
Gantavya Bhatt
Arnav M. Das
Jeff Bilmes
44
1
0
13 Mar 2024
FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System
  to Assist Human Labelers' Preference Elicitation
FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers' Preference Elicitation
Hanfang Lyu
Yuanchen Bai
Xin Liang
Ujaan Das
Chuhan Shi
Leiliang Gong
Yingchi Li
Mingfei Sun
Ming Ge
Xiaojuan Ma
40
0
0
10 Mar 2024
Overcoming Reward Overoptimization via Adversarial Policy Optimization
  with Lightweight Uncertainty Estimation
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
39
13
0
08 Mar 2024
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
Huiying Zhong
Zhun Deng
Weijie J. Su
Zhiwei Steven Wu
Linjun Zhang
52
13
0
08 Mar 2024
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury
Anush Kini
Nagarajan Natarajan
33
55
0
01 Mar 2024
Mode Estimation with Partial Feedback
Mode Estimation with Partial Feedback
Charles Arnal
Vivien A. Cabannes
Vianney Perchet
42
0
0
20 Feb 2024
Generative AI Security: Challenges and Countermeasures
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAML
SILM
61
8
0
20 Feb 2024
Active Preference Optimization for Sample Efficient RLHF
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
27
13
0
16 Feb 2024
ICDPO: Effectively Borrowing Alignment Capability of Others via
  In-context Direct Preference Optimization
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Feifan Song
Yuxuan Fan
Xin Zhang
Peiyi Wang
Houfeng Wang
32
8
0
14 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
  Diverse Human Preferences
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
35
47
0
14 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
24
17
0
14 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General
  Preference Model
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
38
9
0
11 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
36
8
0
09 Feb 2024
Principled Preferential Bayesian Optimization
Principled Preferential Bayesian Optimization
Wenjie Xu
Wenbin Wang
Yuning Jiang
B. Svetozarevic
Colin N. Jones
22
6
0
08 Feb 2024
Personalized Language Modeling from Personalized Human Feedback
Personalized Language Modeling from Personalized Human Feedback
Xinyu Li
Zachary C. Lipton
Liu Leqi
ALM
68
47
0
06 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu
Steffi Chern
Ethan Chern
Ge Zhang
Zekun Wang
Ruibo Liu
Jing Li
Jie Fu
Pengfei Liu
24
20
0
26 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
161
0
18 Dec 2023
Let AI Entertain You: Increasing User Engagement with Generative AI and
  Rejection Sampling
Let AI Entertain You: Increasing User Engagement with Generative AI and Rejection Sampling
Jingying Zeng
Jaewon Yang
Waleed Malik
Xiao Yan
Richard Huang
Qi He
30
1
0
16 Dec 2023
Previous
123
Next