Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
77
1
0
25 Feb 2025
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
Yashan Wang
Shangda Wu
Jianhuai Hu
Xingjian Du
Yueqi Peng
Yongxin Huang
Shuai Fan
Xiaobing Li
Feng Yu
Maosong Sun
225
2
0
25 Feb 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
98
1
0
25 Feb 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Siqi Guo
Ilgee Hong
Vicente Balmaseda
Changlong Yu
Liang Qiu
Xin Liu
Haoming Jiang
Tuo Zhao
Tianbao Yang
104
0
0
25 Feb 2025
Aligning Compound AI Systems via System-level DPO
Xiangwen Wang
Yibo Jacky Zhang
Zhoujie Ding
Katherine Tsai
Haolun Wu
Sanmi Koyejo
71
1
0
24 Feb 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Hao Sun
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
90
1
0
24 Feb 2025
Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization
Lionel Richy Panlap Houamegni
Fatih Gedikli
69
0
0
24 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
174
2
0
24 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
126
29
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
164
1
0
24 Feb 2025
Sequence-level Large Language Model Training with Contrastive Preference Optimization
Zhili Feng
Dhananjay Ram
Cole Hawkins
Aditya Rawal
Jinman Zhao
Sheng Zha
112
1
0
23 Feb 2025
Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation
Deokhyung Kang
Jeonghun Cho
Yejin Jeon
Sunbin Jang
Minsub Lee
Jawoon Cho
Gary Lee
88
0
0
23 Feb 2025
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Max Lamparth
Declan Grabb
Amy Franks
Scott Gershan
Kaitlyn N. Kunstman
...
Monika Drummond Roots
Manu Sharma
Aryan Shrivastava
N. Vasan
Colleen Waickman
144
2
0
22 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
476
1
0
22 Feb 2025
C2-DPO: Constrained Controlled Direct Preference Optimization
Kavosh Asadi
Julien Han
Idan Pipano
Xingzi Xu
Dominique Perrault-Joncas
Shoham Sabach
Karim Bouyarmane
Mohammad Ghavamzadeh
76
0
0
22 Feb 2025
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training
Jaydeep Borkar
Matthew Jagielski
Katherine Lee
Niloofar Mireshghallah
David A. Smith
Christopher A. Choquette-Choo
PILM
228
2
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
142
4
0
21 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
277
11
0
21 Feb 2025
Faster WIND: Accelerating Iterative Best-of-
N
N
N
Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
122
4
0
20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
152
37
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Hui Yuan
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
128
1
0
20 Feb 2025
S
2
^2
2
R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
122
4
0
18 Feb 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
180
1
0
18 Feb 2025
Oreo: A Plug-in Context Reconstructor to Enhance Retrieval-Augmented Generation
Sha Li
Naren Ramakrishnan
RALM
KELM
272
2
0
18 Feb 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
36
1
0
18 Feb 2025
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo
Yukang Yang
Hui Yuan
Mengdi Wang
96
3
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
166
0
0
17 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
169
6
0
17 Feb 2025
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Xueru Wen
Jie Lou
Yaojie Lu
Hongyu Lin
Xing Yu
Xinyu Lu
Xianpei Han
Jia Zheng
Debing Zhang
Le Sun
ALM
127
7
0
17 Feb 2025
ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis
Xu Wang
Jiaju Kang
Puyu Han
Yubao Zhao
Qian Liu
Liwenfei He
Lingqiong Zhang
Lingyun Dai
Yongcheng Wang
Jie Tao
LM&MA
179
1
0
16 Feb 2025
Bone Soups: A Seek-and-Soup Model Merging Approach for Controllable Multi-Objective Generation
Guofu Xie
Xiao Zhang
Ting Yao
Yunsheng Shi
MoMe
166
1
0
15 Feb 2025
Preference learning made easy: Everything should be understood through win rate
Lily H. Zhang
Rajesh Ranganath
164
0
0
14 Feb 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli
Ayhan Suleymanzade
Na Min An
Huzama Ahmad
Eunsu Kim
Junyeong Park
James Thorne
Alice Oh
156
4
0
13 Feb 2025
AuPair: Golden Example Pairs for Code Repair
Aditi Mavalankar
Hassan Mansoor
Zita Marinho
Masha Samsikova
Tom Schaul
KELM
LRM
384
0
0
12 Feb 2025
DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization
Xuefeng Liu
Songhao Jiang
Siyu Chen
Zhuoran Yang
Yuxin Chen
Ian Foster
Rick L. Stevens
LM&MA
OffRL
129
1
0
11 Feb 2025
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
Jean Vassoyan
Nathanaël Beau
Roman Plaud
OffRL
167
2
0
10 Feb 2025
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
116
1
0
10 Feb 2025
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
AAML
77
0
0
09 Feb 2025
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Jingxin Xu
Guoshun Nan
Sheng Guan
Sicong Leng
Yang Liu
Zixiao Wang
Yuyang Ma
Zhili Zhou
Yanzhao Hou
Xiaofeng Tao
LM&MA
123
0
0
08 Feb 2025
Design Considerations in Offline Preference-based RL
Alekh Agarwal
Christoph Dann
T. V. Marinov
OffRL
110
1
0
08 Feb 2025
Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics
Hussam Ghanem
C. Cruz
139
0
0
07 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
143
0
0
06 Feb 2025
Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales
Zhen Qian
Xiuzhen Zhang
Xiaofei Xu
Xiwei Xu
LRM
81
0
0
05 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Yuxiao Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
126
2
0
05 Feb 2025
Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making
Balakrishnan Dharmalingam
Rajdeep Mukherjee
Brett Piggott
Guohuan Feng
Anyi Liu
75
1
0
05 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
193
5
0
04 Feb 2025
Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model
Hadas Ben-Atya
N. Gavrielov
Zvi Badash
G. Focht
R. Cytter-Kuint
Talar Hagopian
Dan Turner
M. Freiman
93
0
0
02 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
Soumik Sarkar
OffRL
SyDa
293
1
0
31 Jan 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Lefei Zhang
Dacheng Tao
190
5
0
31 Jan 2025
Controllable Protein Sequence Generation with LLM Preference Optimization
Xiangyu Liu
Yi Liu
Silei Chen
Wei Hu
119
1
0
28 Jan 2025
Previous
1
2
3
...
5
6
7
...
29
30
31
Next