Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples
Soumya Suvra Ghosal
Vaibhav Singh
Akash Ghosh
Soumyabrata Pal
Subhadip Baidya
Sriparna Saha
Dinesh Manocha
28
0
0
19 Jun 2025
Reranking-based Generation for Unbiased Perspective Summarization
Narutatsu Ri
Nicholas Deas
Kathleen McKeown
OffRL
26
0
0
19 Jun 2025
Modeling the One-to-Many Property in Open-Domain Dialogue with LLMs
Jing Yang Lee
Kong-Aik Lee
Woon-Seng Gan
45
0
0
18 Jun 2025
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
Zongxia Li
Yapei Chang
Yuhang Zhou
Xiyang Wu
Zichao Liang
Yoo Yeon Sung
Jordan L. Boyd-Graber
28
0
0
18 Jun 2025
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Yujun Wang
Jinhe Bi
Yunpu Ma
Soeren Pirk
MLLM
63
0
0
17 Jun 2025
GRAM: A Generative Foundation Reward Model for Reward Generalization
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
...
Bei Li
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
ALM
OffRL
LRM
59
0
0
17 Jun 2025
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
OffRL
LRM
69
1
0
16 Jun 2025
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
Jay Hyeon Cho
JunHyeok Oh
Myunsoo Kim
Byung-Jun Lee
29
0
0
15 Jun 2025
RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control
Junpeng Yue
Zepeng Wang
Yuxuan Wang
Weishuai Zeng
Jiangxing Wang
Xinrun Xu
Yu Zhang
Sipeng Zheng
Ziluo Ding
Zongqing Lu
AI4CE
29
0
0
15 Jun 2025
Improving Large Language Model Safety with Contrastive Representation Learning
Samuel Simko
Mrinmaya Sachan
Bernhard Schölkopf
Zhijing Jin
AAML
17
0
0
13 Jun 2025
RePO: Replay-Enhanced Policy Optimization
Siheng Li
Zhanhui Zhou
W. Lam
Chao Yang
Chaochao Lu
OffRL
88
0
0
11 Jun 2025
Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
77
0
0
11 Jun 2025
Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
Andreas Schlaginhaufen
Reda Ouhamma
Maryam Kamgarpour
76
0
0
11 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
37
0
0
10 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
46
0
0
10 Jun 2025
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Phuc Minh Nguyen
Ngoc-Hieu Nguyen
Duy Nguyen
Anji Liu
An Mai
Binh T. Nguyen
Daniel Sonntag
Khoa D. Doan
47
0
0
10 Jun 2025
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
Yaswanth Chittepu
Blossom Metevier
Will Schwarzer
Austin Hoag
S. Niekum
Philip S Thomas
27
0
0
09 Jun 2025
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu
Shengnan Ma
Bo Wang
Jiaheng Yu
Lewei Lu
Ziwei Liu
36
0
0
09 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
38
0
0
09 Jun 2025
Improving Fairness of Large Language Models in Multi-document Summarization
Haoyuan Li Yusen Zhang
Snigdha Chaturvedi
Snigdha Chaturvedi
26
0
0
09 Jun 2025
AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Zixuan Jiang
Renjing Xu
29
0
0
08 Jun 2025
Reward Model Interpretability via Optimal and Pessimal Tokens
Brian Christian
Hannah Rose Kirk
Jessica A.F. Thompson
Christopher Summerfield
Tsvetomira Dumbalska
AAML
30
0
0
08 Jun 2025
Robotic Policy Learning via Human-assisted Action Preference Optimization
Wenke Xia
Yichu Yang
Hongtao Wu
Xiao Ma
Tao Kong
Di Hu
35
0
0
08 Jun 2025
History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM
Andrew Kiruluta
Andreas Lemos
Priscilla Burity
LRM
31
0
0
08 Jun 2025
From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp Metaprogramming Loop
Jordi de la Torre
LLMAG
KELM
27
0
0
08 Jun 2025
Debiasing Online Preference Learning via Preference Feature Preservation
Dongyoung Kim
Jinsung Yoon
Jinwoo Shin
Jaehyung Kim
25
0
0
06 Jun 2025
A Smooth Sea Never Made a Skilled
SAILOR
\texttt{SAILOR}
SAILOR
: Robust Imitation via Learning to Search
A. Jain
Vibhakar Mohta
Subin Kim
Atiksh Bhardwaj
Juntao Ren
Yunhai Feng
Sanjiban Choudhury
Gokul Swamy
OffRL
130
0
0
05 Jun 2025
Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
40
0
0
05 Jun 2025
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
Tianjiao Li
Mengran Yu
Chenyu Shi
Yanjun Zhao
Xiaojing Liu
Qiang Zhang
Qi Zhang
Xuanjing Huang
Jiayin Wang
117
0
0
05 Jun 2025
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Anirudh Bharadwaj
Chaitanya Malaviya
Nitish Joshi
Mark Yatskar
132
0
0
05 Jun 2025
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Lei Hsiung
Tianyu Pang
Yung-Chen Tang
Linyue Song
Tsung-Yi Ho
Pin-Yu Chen
Yaoqing Yang
128
0
0
05 Jun 2025
RewardAnything: Generalizable Principle-Following Reward Models
Zhuohao Yu
Jiali Zeng
Weizheng Gu
Yidong Wang
Jindong Wang
Fandong Meng
Jie Zhou
Yue Zhang
Shikun Zhang
Wei Ye
LRM
130
1
0
04 Jun 2025
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal
Souradip Chakraborty
Avinash Reddy
Yifu Lu
Mengdi Wang
Dinesh Manocha
Furong Huang
Mohammad Ghavamzadeh
Amrit Singh Bedi
ReLM
LRM
101
0
0
04 Jun 2025
Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement
Xiaofeng Zhou
Heyan Huang
Lizi Liao
LLMAG
102
0
0
04 Jun 2025
Structured Pruning for Diverse Best-of-N Reasoning Optimization
Hieu Trung Nguyen
Bao Nguyen
Viet Anh Nguyen
LRM
75
0
0
04 Jun 2025
From Anger to Joy: How Nationality Personas Shape Emotion Attribution in Large Language Models
M. Kamruzzaman
Abdullah Al Monsur
Gene Louis Kim
Anshuman Chhabra
75
0
0
03 Jun 2025
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
Shenghua He
Tian Xia
Xuan Zhou
Hui Wei
OffRL
71
0
0
03 Jun 2025
Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation
Dingwei Chen
Ziqiang Liu
Feiteng Fang
Chak Tou Leong
Shiwen Ni
A. Argha
Hamid Alinejad-Rokny
Min Yang
Chengming Li
KELM
HILM
61
0
0
03 Jun 2025
Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models
Ram Potham
Max Harms
LRM
71
0
0
03 Jun 2025
daDPO: Distribution-Aware DPO for Distilling Conversational Abilities
Zhengze Zhang
Shiqi Wang
Yiqun Shen
Simin Guo
Dahua Lin
Xiaoliang Wang
Nguyen Cam-Tu
Fei Tan
26
0
0
03 Jun 2025
Understanding the Impact of Sampling Quality in Direct Preference Optimization
Kyung Rok Kim
Yumo Bai
Chonghuan Wang
Guanting Chen
29
0
0
03 Jun 2025
Quantitative LLM Judges
Aishwarya Sahoo
Jeevana Kruthi Karnuthala
Tushar Parmanand Budhwani
Pranchal Agarwal
Sankaran Vaidyanathan
...
Jennifer Healey
Nedim Lipka
Ryan Rossi
Uttaran Bhattacharya
Branislav Kveton
ELM
66
0
0
03 Jun 2025
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Tim Woydt
Moritz Willig
Antonia Wüst
Lukas Helff
Wolfgang Stammer
Constantin Rothkopf
Kristian Kersting
66
1
0
02 Jun 2025
Towards Human-like Preference Profiling in Sequential Recommendation
Z. Ouyang
Qianlong Wen
Chunhui Zhang
Yanfang Ye
Soroush Vosoughi
HAI
31
0
0
02 Jun 2025
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng
Caroline Chan
F. Durand
Phillip Isola
EGVM
42
0
0
02 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
92
0
0
01 Jun 2025
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
50
0
0
01 Jun 2025
Whispers of Many Shores: Cultural Alignment through Collaborative Cultural Expertise
Shuai Feng
Wei-Chuang Chan
Srishti Chouhan
Junior Francisco Garcia Ayala
Srujananjali Medicherla
Kyle Clark
Mingwei Shi
40
0
0
30 May 2025
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
Jingyan Shen
Jiarui Yao
Rui Yang
Yifan Sun
Feng Luo
Boyao Wang
Tong Zhang
Han Zhao
38
0
0
30 May 2025
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
Yunze Lin
LRM
29
0
0
30 May 2025
1
2
3
4
...
29
30
31
Next