ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.11270
  4. Cited By
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons

Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons

26 January 2023
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
    OffRL
ArXivPDFHTML

Papers citing "Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons"

47 / 147 papers shown
Title
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
31
55
0
13 Dec 2023
Make Them Spill the Beans! Coercive Knowledge Extraction from
  (Production) LLMs
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
38
13
0
08 Dec 2023
RLHF and IIA: Perverse Incentives
RLHF and IIA: Perverse Incentives
Wanqiao Xu
Shi Dong
Xiuyuan Lu
Grace Lam
Zheng Wen
Benjamin Van Roy
29
2
0
02 Dec 2023
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response
  Generation
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang
Rui Zheng
Haoming Li
Qi Zhang
Tao Gui
Fei Liu
OffRL
25
3
0
15 Nov 2023
Conditions on Preference Relations that Guarantee the Existence of
  Optimal Policies
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
26
2
0
03 Nov 2023
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical
  Summarization
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Shuwei Chen
Beining Wang
Rohan Mittal
Hong-ye Yu
KELM
ALM
HILM
31
7
0
30 Oct 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open
  Challenges and Interdisciplinary Research Directions
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
32
189
0
30 Oct 2023
Differentially Private Reward Estimation with Preference Feedback
Differentially Private Reward Estimation with Preference Feedback
Sayak Ray Chowdhury
Xingyu Zhou
Nagarajan Natarajan
46
4
0
30 Oct 2023
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy
  Evaluation
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
Shengpu Tang
Jenna Wiens
OffRL
CML
16
4
0
26 Oct 2023
Sample Complexity of Preference-Based Nonparametric Off-Policy
  Evaluation with Deep Networks
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
Zihao Li
Xiang Ji
Minshuo Chen
Mengdi Wang
OffRL
29
0
0
16 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
27
51
0
16 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
36
4
0
11 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
40
9
0
10 Oct 2023
Factual and Personalized Recommendations using Language Models and
  Reinforcement Learning
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
32
4
0
09 Oct 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
44
54
0
30 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
34
21
0
28 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
177
0
26 Sep 2023
Text2Reward: Reward Shaping with Language Models for Reinforcement
  Learning
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
Tianbao Xie
Siheng Zhao
Chen Henry Wu
Yitao Liu
Qian Luo
Victor Zhong
Yanchao Yang
Tao Yu
LM&Ro
50
48
0
20 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
29
67
0
12 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning
  from Human Feedback
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
Souradip Chakraborty
Amrit Singh Bedi
Alec Koppel
Dinesh Manocha
Huazheng Wang
Mengdi Wang
Furong Huang
31
26
0
03 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
52
473
0
27 Jul 2023
Provable Benefits of Policy Learning from Human Preferences in
  Contextual Bandit Problems
Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
Xiang Ji
Huazheng Wang
Minshuo Chen
Tuo Zhao
Mengdi Wang
OffRL
37
6
0
24 Jul 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language
  Model Alignment
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
Kevin Kaichuang Yang
Dan Klein
Asli Celikyilmaz
Nanyun Peng
Yuandong Tian
ALM
36
30
0
24 Jul 2023
Kernelized Offline Contextual Dueling Bandits
Kernelized Offline Contextual Dueling Bandits
Viraj Mehta
Ojash Neopane
Vikramjeet Das
Sen Lin
J. Schneider
W. Neiswanger
OffRL
26
3
0
21 Jul 2023
Provably Efficient Iterated CVaR Reinforcement Learning with Function
  Approximation and Human Feedback
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback
Yu Chen
Yihan Du
Pihe Hu
Si-Yi Wang
De-hui Wu
Longbo Huang
24
6
0
06 Jul 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
26
238
0
30 Jun 2023
Is RLHF More Difficult than Standard RL?
Is RLHF More Difficult than Standard RL?
Yuanhao Wang
Qinghua Liu
Chi Jin
OffRL
19
57
0
25 Jun 2023
A Markovian Formalism for Active Querying
A Markovian Formalism for Active Querying
Sid Ijju
18
1
0
13 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
28
39
0
04 Jun 2023
Human-Aligned Calibration for AI-Assisted Decision Making
Human-Aligned Calibration for AI-Assisted Decision Making
N. C. Benz
Manuel Gomez Rodriguez
17
17
0
31 May 2023
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Wen Sun
Jason D. Lee
27
8
0
29 May 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
31
54
0
29 May 2023
Reward Collapse in Aligning Large Language Models
Reward Collapse in Aligning Large Language Models
Ziang Song
Tianle Cai
Jason D. Lee
Weijie J. Su
ALM
33
22
0
28 May 2023
Provable Offline Preference-Based Reinforcement Learning
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
43
12
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference
  Judgments via GPT-4
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
31
11
0
24 May 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
29
507
0
07 Mar 2023
When Demonstrations Meet Generative World Models: A Maximum Likelihood
  Framework for Offline Inverse Reinforcement Learning
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
Siliang Zeng
Chenliang Li
Alfredo García
Min-Fong Hong
OffRL
34
13
0
15 Feb 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
506
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
446
0
23 Aug 2022
Human-in-the-loop: Provably Efficient Preference-based Reinforcement
  Learning with General Function Approximation
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
128
61
0
23 May 2022
Perspectives on Incorporating Expert Feedback into Model Updates
Perspectives on Incorporating Expert Feedback into Model Updates
Valerie Chen
Umang Bhatt
Hoda Heidari
Adrian Weller
Ameet Talwalkar
30
11
0
13 May 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
246
259
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
Neurosymbolic AI: The 3rd Wave
Neurosymbolic AI: The 3rd Wave
Artur Garcez
Luís C. Lamb
NAI
65
292
0
10 Dec 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
298
1,610
0
18 Sep 2019
Previous
123