Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,443 papers shown
Title
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
46
17
0
12 Jun 2024
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Xiaoxuan Lu
Samuel Holt
Claudio Fanconi
Alex J. Chan
Jakob Foerster
M. Schaar
R. T. Lange
OffRL
42
16
0
12 Jun 2024
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng
Bowen Qin
Chen Huang
Youcheng Huang
Zheng-Wei Zhang
Wenqiang Lei
44
3
0
12 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
63
2
0
12 Jun 2024
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Jie Ruan
Xiao Pu
Mingqi Gao
Xiaojun Wan
Yuesheng Zhu
35
3
0
12 Jun 2024
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
67
72
0
12 Jun 2024
Prompt-Based Length Controlled Generation with Multiple Control Types
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
26
7
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Dinesh Manocha
Heng Huang
47
4
0
11 Jun 2024
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
46
34
0
11 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Multi-objective Reinforcement learning from AI Feedback
Marcus Williams
51
1
0
11 Jun 2024
Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models
Daniela Occhipinti
Michele Marchi
Irene Mondella
Huiyuan Lai
F. Dell’Orletta
Malvina Nissim
Marco Guerini
34
1
0
11 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRM
VLM
35
4
0
11 Jun 2024
Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models
Som Sagar
Aditya Taparia
Ransalu Senanayake
45
10
0
11 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
57
11
0
11 Jun 2024
Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness
Dingrong Wang
Hitesh Sapkota
Zhiqiang Tao
Qi Yu
AAML
37
1
0
10 Jun 2024
Towards Lifelong Learning of Large Language Models: A Survey
Junhao Zheng
Shengjie Qiu
Chengming Shi
Qianli Ma
KELM
CLL
35
15
0
10 Jun 2024
Aligning Large Language Models with Representation Editing: A Control Perspective
Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang
Kai Wang
Chao Zhang
38
22
0
10 Jun 2024
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Youssef Mroueh
42
6
0
09 Jun 2024
Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk
Youssef Mroueh
Brian M. Belgodere
Mattia Rigotti
Apoorva Nitsure
Mikhail Yurochkin
Kristjan Greenewald
Jirí Navrátil
Jerret Ross
50
11
0
09 Jun 2024
Creativity Has Left the Chat: The Price of Debiasing Language Models
Behnam Mohammadi
45
9
0
08 Jun 2024
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing
Biqing Qi
Pengfei Li
Fangyuan Li
Junqi Gao
Kaiyan Zhang
Bowen Zhou
32
12
0
08 Jun 2024
Planning Like Human: A Dual-process Framework for Dialogue Planning
Tao He
Lizi Liao
Yixin Cao
Yuanxing Liu
Ming Liu
Zerui Chen
Bing Qin
66
18
0
08 Jun 2024
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun
Navid Salami Pargoo
Peter J. Jin
Jorge Ortiz
40
19
0
06 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
48
4
0
06 Jun 2024
Aligning Agents like Large Language Models
Adam Jelley
Yuhan Cao
Dave Bignell
Sam Devlin
Tabish Rashid
LM&Ro
51
1
0
06 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
46
18
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
35
7
0
05 Jun 2024
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback
Timon Ziegenbein
Gabriella Skitalinskaya
Alireza Bayat Makou
Henning Wachsmuth
LLMAG
KELM
37
5
0
05 Jun 2024
HYDRA: Model Factorization Framework for Black-Box LLM Personalization
Yuchen Zhuang
Haotian Sun
Yue Yu
Rushi Qiang
Qifan Wang
Chao Zhang
Bo Dai
AAML
56
16
0
05 Jun 2024
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Haorui Wang
Zhen Qin
Feng Han
Jialu Liu
Simon Baumgartner
Michael Bendersky
Chao Zhang
45
6
0
05 Jun 2024
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Ilgee Hong
Zichong Li
Alexander Bukharin
Yixiao Li
Haoming Jiang
Tianbao Yang
Tuo Zhao
40
4
0
04 Jun 2024
Test-Time Regret Minimization in Meta Reinforcement Learning
Mirco Mutti
Aviv Tamar
26
4
0
04 Jun 2024
Dishonesty in Helpful and Harmless Alignment
Youcheng Huang
Jingkun Tang
Duanyu Feng
Zheng-Wei Zhang
Wenqiang Lei
Jiancheng Lv
Anthony G. Cohn
LLMSV
46
4
0
04 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
44
1
0
03 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
31
1
0
03 Jun 2024
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Mengge Xue
Zhenyu Hu
Liqun Liu
Kuo Liao
Shuang Li
Honglin Han
Meng Zhao
Chengguo Yin
51
5
0
03 Jun 2024
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
33
8
0
03 Jun 2024
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
OffRL
23
1
0
03 Jun 2024
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Lin Gui
Cristina Garbacea
Victor Veitch
BDL
LM&MA
43
37
0
02 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
46
10
0
02 Jun 2024
LLMs Could Autonomously Learn Without External Supervision
Ke Ji
Junying Chen
Anningzhe Gao
Wenya Xie
Xiang Wan
Benyou Wang
45
4
0
02 Jun 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
53
10
0
02 Jun 2024
Aligning Language Models with Demonstrated Feedback
Omar Shaikh
Michelle S. Lam
Joey Hejna
Yijia Shao
Michael S. Bernstein
Michael S. Bernstein
Diyi Yang
ALM
41
24
0
02 Jun 2024
Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning
Sangwon Ryu
Heejin Do
Yunsu Kim
Gary Geunbae Lee
Jungseul Ok
34
3
0
01 Jun 2024
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
Yueqin Yin
Zhendong Wang
Yujia Xie
Weizhu Chen
Mingyuan Zhou
43
4
0
31 May 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
35
0
31 May 2024
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
49
9
0
31 May 2024
Transfer Q Star: Principled Decoding for LLM Alignment
Souradip Chakraborty
Soumya Suvra Ghosal
Ming Yin
Dinesh Manocha
Mengdi Wang
Amrit Singh Bedi
Furong Huang
54
25
0
30 May 2024
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
Bolin Ni
Jingcheng Hu
Yixuan Wei
Houwen Peng
Zheng-Wei Zhang
Gaofeng Meng
Han Hu
LM&MA
ALM
29
3
0
30 May 2024
Previous
1
2
3
...
10
11
12
...
27
28
29
Next