Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
109
0
0
19 May 2025
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
Zilu Tang
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
121
0
0
19 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
106
0
0
18 May 2025
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
34
0
0
17 May 2025
CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning
Hongbo Jin
Ruyang Liu
Wenhao Zhang
Guibo Luo
Ge Li
LRM
108
0
0
17 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
113
2
0
17 May 2025
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
Zepeng Ding
Dixuan Wang
Ziqin Luo
Guochao Jiang
Deqing Yang
Jiaqing Liang
73
0
0
17 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
169
0
0
16 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
139
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
165
2
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
106
0
0
16 May 2025
Ranked Voting based Self-Consistency of Large Language Models
Weiqin Wang
Yile Wang
Hui Huang
LRM
76
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Ziyi Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
106
2
0
16 May 2025
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen
Wanqi Yin
Xiaofeng Yang
Cheng Chen
Chaoyue Song
Zhongang Cai
Lei Yang
Hao Wang
Guosheng Lin
152
0
0
15 May 2025
WorldPM: Scaling Human Preference Modeling
Binghai Wang
Runji Lin
Keming Lu
Le Yu
Zizhuo Zhang
...
Xuanjing Huang
Yu-Gang Jiang
Bowen Yu
Jingren Zhou
Junyang Lin
114
1
0
15 May 2025
Detecting Prefix Bias in LLM-based Reward Models
Ashwin Kumar
Yuzi He
Aram H. Markosyan
Bobbie Chern
Imanol Arrieta-Ibarra
71
0
0
13 May 2025
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
207
1
0
13 May 2025
Improved Algorithms for Differentially Private Language Model Alignment
Keyu Chen
Hao Tang
Qinglin Liu
Yizhao Xu
59
0
0
13 May 2025
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong
Noah Lee
Eunki Kim
Guijin Son
Woojin Chung
Aman Gupta
Shao Tang
James Thorne
106
0
0
12 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
Haoyang Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
236
0
0
12 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
126
1
0
12 May 2025
DARLR: Dual-Agent Offline Reinforcement Learning for Recommender Systems with Dynamic Reward
Yi Zhang
Ruihong Qiu
Xuwei Xu
Jiajun Liu
Sen Wang
OffRL
74
0
0
12 May 2025
Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
Fabrice Harel-Canada
Boran Erol
Connor Choi
J. Liu
Gary Jiarui Song
Nanyun Peng
Amit Sahai
WaLM
82
0
0
11 May 2025
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Aniruddha Roy
Pretam Ray
Abhilash Nandy
Somak Aditya
Pawan Goyal
ALM
70
0
0
10 May 2025
Assessing Robustness to Spurious Correlations in Post-Training Language Models
Julia Shuieh
Prasann Singhal
Apaar Shanker
John Heyer
George Pu
Samuel Denton
LRM
74
0
0
09 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
125
0
0
06 May 2025
Soft Best-of-n Sampling for Model Alignment
C. M. Verdun
Alex Oesterling
Himabindu Lakkaraju
Flavio du Pin Calmon
BDL
446
2
0
06 May 2025
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Da Zheng
Lun Du
Junwei Su
Yuchen Tian
Yuqi Zhu
Jintian Zhang
Lanning Wei
Xin Xu
Ningyu Zhang
LRM
209
1
0
06 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
396
21
0
05 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
147
0
0
05 May 2025
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
Soumen Kumar Mondal
Akshit Varmora
Prateek Chanda
Ganesh Ramakrishnan
100
0
0
05 May 2025
Semantic Probabilistic Control of Language Models
Kareem Ahmed
Catarina G Belém
Padhraic Smyth
Sameer Singh
119
1
0
04 May 2025
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
Mazal Bethany
Nishant Vishwamitra
Cho-Yu Chiang
Peyman Najafirad
AAML
60
0
0
03 May 2025
Multi-agents based User Values Mining for Recommendation
Lawrence Yunliang Chen
Wei Yuan
Tong Chen
Xiangyu Zhao
Nguyen Quoc Viet Hung
Hongzhi Yin
OffRL
140
0
0
02 May 2025
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
170
1
0
30 Apr 2025
From Precision to Perception: User-Centred Evaluation of Keyword Extraction Algorithms for Internet-Scale Contextual Advertising
Jingwen Cai
Sara Leckner
Johanna Björklund
74
0
0
30 Apr 2025
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models
Zhiting Fan
Ruizhe Chen
Zuozhu Liu
97
1
0
30 Apr 2025
HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation
Cristina Garbacea
Chenhao Tan
160
0
0
29 Apr 2025
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
225
4
0
29 Apr 2025
Aligning Language Models for Icelandic Legal Text Summarization
Þórir Hrafn Harðarson
Hrafn Loftsson
Stefán Ólafsson
AILaw
AI4TS
ELM
130
0
0
25 Apr 2025
ParetoHqD: Fast Offline Multiobjective Alignment of Large Language Models using Pareto High-quality Data
Haoran Gu
Handing Wang
Yi Mei
Mengjie Zhang
Yaochu Jin
80
1
0
23 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
427
31
0
22 Apr 2025
Establishing Reliability Metrics for Reward Models in Large Language Models
Yizhou Chen
Yawen Liu
Xuesi Wang
Qingtao Yu
Guangda Huzhang
Anxiang Zeng
Han Yu
Zhiming Zhou
91
0
0
21 Apr 2025
In-context Ranking Preference Optimization
Junda Wu
Rohan Surana
Zhouhang Xie
Yiran Shen
Yu Xia
Tong Yu
Ryan Rossi
Prithviraj Ammanabrolu
Julian McAuley
97
0
0
21 Apr 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
105
0
0
20 Apr 2025
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose
Zhihan Xiong
Yuejie Chi
Simon S. Du
Lin Xiao
Maryam Fazel
88
2
0
20 Apr 2025
Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey
Ahsan Bilal
Muhammad Ahmed Mohsin
Muhammad Umer
Muhammad Awais Khan Bangash
Muhammad Ali Jamshed
LLMAG
LRM
AI4CE
166
1
0
20 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
117
12
0
18 Apr 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li
Wenhao Chai
Xingyu Fu
Haiyang Xu
Saining Xie
MedIm
85
1
0
17 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
114
2
0
17 Apr 2025
Previous
1
2
3
4
5
6
...
29
30
31
Next