Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Stochastic Trajectory Prediction under Unstructured Constraints
Hao Ma
Zhiqiang Pu
Shijie Wang
Boyin Liu
Huimu Wang
Yanyan Liang
Jianqiang Yi
94
0
0
18 Mar 2025
Augmented Adversarial Trigger Learning
Zhe Wang
Yanjun Qi
96
0
0
16 Mar 2025
Basic Category Usage in Vision Language Models
Hunter Sawyer
Jesse Roberts
Kyle Moore
VLM
77
0
0
16 Mar 2025
Empirical Privacy Variance
Yuzheng Hu
Fan Wu
Ruicheng Xian
Yuhang Liu
Lydia Zakynthinou
Pritish Kamath
Chiyuan Zhang
David A. Forsyth
156
0
0
16 Mar 2025
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng
Yao Liu
Huzefa Rangwala
George Karypis
Mingyi Hong
Rasool Fakoor
126
2
0
15 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
98
1
0
14 Mar 2025
Efficient Safety Alignment of Large Language Models via Preference Re-ranking and Representation-based Reward Modeling
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
125
0
0
13 Mar 2025
Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment
Zhenyu Liu
Dongfang Li
Xinshuo Hu
X. Zhao
Yibin Chen
Baotian Hu
Min Zhang
115
1
0
13 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
102
0
0
13 Mar 2025
RankPO: Preference Optimization for Job-Talent Matching
Yize Zhang
Ming Wang
Yu Wang
Xiaohui Wang
117
0
0
13 Mar 2025
Robust Multi-Objective Controlled Decoding of Large Language Models
Seongho Son
William Bankes
Sangwoong Yoon
Shyam Sundhar Ramesh
Xiaohang Tang
Ilija Bogunovic
132
2
0
11 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
95
2
0
11 Mar 2025
Combinatorial Optimization via LLM-driven Iterated Fine-tuning
Pranjal Awasthi
Sreenivas Gollapudi
Ravi Kumar
Kamesh Munagala
153
1
0
10 Mar 2025
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality
Zelei Cheng
Xin-Qiang Cai
Yuting Tang
Pushi Zhang
Boming Yang
Masashi Sugiyama
Xinyu Xing
157
0
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
468
2
0
10 Mar 2025
Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations
Hari Shankar
Vedanta S P
Tejas Cavale
Ponnurangam Kumaraguru
Abhijnan Chakraborty
80
0
0
10 Mar 2025
LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model
Xinxin Zhao
Haoyang Li
Jing Zhang
Xinmei Huang
Tieying Zhang
Jianjun Chen
Rui Shi
Cuiping Li
Hong Chen
59
0
0
10 Mar 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
125
1
0
10 Mar 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
Lu Chen
Kai Yu
92
1
0
09 Mar 2025
Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting
Yufei Li
John Nham
Ganesh Jawahar
Lei Shu
David C. Uthus
Yun-hsuan Sung
Chengrun Yang
Itai Rolnick
Yi Qiao
Cong Liu
OffRL
113
0
0
09 Mar 2025
ROCM: RLHF on consistency models
Shivanshu Shekhar
Tong Zhang
78
0
0
08 Mar 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
111
1
0
08 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
124
0
0
07 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
144
1
0
06 Mar 2025
Mixed Likelihood Variational Gaussian Processes
Kaiwen Wu
Craig Sanders
Benjamin Letham
Phillip Guan
119
0
0
06 Mar 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Zhongfu Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Yaojie Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
156
37
0
06 Mar 2025
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones
Arjun Patrawala
Jacob Steinhardt
78
1
0
06 Mar 2025
L
2
^2
2
M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
124
2
0
06 Mar 2025
LLMs Can Generate a Better Answer by Aggregating Their Own Responses
Zichong Li
Xinyu Feng
Yuheng Cai
Zixuan Zhang
Tianyi Liu
Chen Liang
Weizhu Chen
Haoyu Wang
Tiejun Zhao
LRM
122
2
0
06 Mar 2025
Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm
Haksub Kim
Kanghoon Lee
J. Park
Jiachen Li
Jinkyoo Park
126
1
0
05 Mar 2025
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems
Mahfuz Ahmed Anik
Abdur Rahman
Azmine Toushik Wasi
Md Manjurul Ahsan
99
1
0
05 Mar 2025
Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models
Alessio Galatolo
Zhenbang Dai
Katie Winkle
Meriem Beloucif
91
0
0
05 Mar 2025
AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
Songming Zhang
Xue Zhang
Tong Zhang
Bojie Hu
Yufeng Chen
Jinan Xu
125
1
0
04 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
178
0
0
03 Mar 2025
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Gokul Swamy
Sanjiban Choudhury
Wen Sun
Zhiwei Steven Wu
J. Andrew Bagnell
OffRL
142
20
0
03 Mar 2025
Dynamic Search for Inference-Time Alignment in Diffusion Models
Xiner Li
Masatoshi Uehara
Xingyu Su
Gabriele Scalia
Tommaso Biancalani
Aviv Regev
Sergey Levine
Shuiwang Ji
96
4
0
03 Mar 2025
PABBO: Preferential Amortized Black-Box Optimization
Xinyu Zhang
Daolang Huang
Samuel Kaski
Julien Martinelli
94
1
0
02 Mar 2025
Distributionally Robust Reinforcement Learning with Human Feedback
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
108
3
0
01 Mar 2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta
Ryan Sullivan
Yunxuan Li
Samrat Phatale
Abhinav Rastogi
69
1
0
01 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
118
1
0
01 Mar 2025
Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models
Kuang-Da Wang
Teng-Ruei Chen
Yu-Heng Hung
Shuoyang Ding
Yueh-Hua Wu
Yu-Chun Wang
Chao-Han Huck Yang
Chao-Han Huck Yang
Wen-Chih Peng
Ping-Chun Hsieh
123
0
0
28 Feb 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
138
8
0
27 Feb 2025
Multi-Turn Code Generation Through Single-Step Rewards
A. Jain
Gonzalo Gonzalez-Pumariega
Wayne Chen
Alexander M. Rush
Wenting Zhao
Sanjiban Choudhury
LRM
101
3
0
27 Feb 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
489
1
0
27 Feb 2025
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
Jiaxin Deng
Shiyao Wang
Kuo Cai
Lejian Ren
Qigen Hu
Weifeng Ding
Qiang Luo
Guorui Zhou
131
12
0
26 Feb 2025
VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model
Jiani Zheng
Lu Wang
Fangkai Yang
Chen Zhang
Lingrui Mei
Wenjie Yin
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
115
8
0
26 Feb 2025
Controlled Diversity: Length-optimized Natural Language Generation
Diana Marie Schenke
Timo Baumann
71
0
0
26 Feb 2025
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong
Tiancheng Hu
Yinhong Liu
Ahmet Üstün
Nigel Collier
124
1
0
26 Feb 2025
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
Shuming Shi
Ruobing Zuo
Gaolei He
Jianlin Wang
Chenyang Xu
Zhengfeng Yang
122
0
0
25 Feb 2025
What is the Alignment Objective of GRPO?
Milan Vojnovic
Se-Young Yun
138
5
0
25 Feb 2025
Previous
1
2
3
4
5
6
...
29
30
31
Next