Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,380 papers shown
Title
Learning to Reason without External Rewards
Xuandong Zhao
Zhewei Kang
Aosong Feng
Sergey Levine
Dawn Song
OffRL
ReLM
LRM
135
8
0
26 May 2025
S2LPP: Small-to-Large Prompt Prediction across LLMs
Liang Cheng
Tianyi Li
Zhaowei Wang
Mark Steedman
LRM
26
0
0
26 May 2025
MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection
Yinuo Xue
Eric Spero
Yun Sing Koh
Giovanni Russello
AAML
30
1
0
26 May 2025
CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models
Chunyang Li
Junwei Zhang
Anda Cheng
Zhuo Ma
Xinghua Li
Jianfeng Ma
SILM
AAML
42
0
0
26 May 2025
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
LRM
97
0
0
26 May 2025
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Yi Liu
Dianqing Liu
Mingye Zhu
Junbo Guo
Yongdong Zhang
Zhendong Mao
102
0
0
26 May 2025
Proxy-Free GFlowNet
Ruishuo Chen
Xun Wang
Rui Hu
Zhuoran Li
Longbo Huang
74
0
0
26 May 2025
What Can RL Bring to VLA Generalization? An Empirical Study
Jijia Liu
Feng Gao
Bingwen Wei
Xinlei Chen
Qingmin Liao
Yi Wu
Chao Yu
Yu Wang
OffRL
302
0
0
26 May 2025
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Geon-hyeong Kim
Youngsoo Jang
Yu Jin Kim
Byoungjip Kim
Honglak Lee
Kyunghoon Bae
Moontae Lee
28
2
0
26 May 2025
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
31
0
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELM
LRM
75
0
0
26 May 2025
Learning to Select In-Context Demonstration Preferred by Large Language Model
Zheng Zhang
Shaocheng Lan
Lei Song
Jiang Bian
Yexin Li
Kan Ren
29
0
0
26 May 2025
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Xiaoyuan Wu
Weiran Lin
Omer Akgul
Lujo Bauer
HILM
26
0
0
26 May 2025
FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement
Bingguang Hao
Maolin Wang
Zengzhuang Xu
Cunyin Peng
Yicheng Chen
Xiangyu Zhao
Jinjie Gu
Chenyi Zhuang
ReLM
LRM
113
0
0
26 May 2025
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Jiaxin Song
Yixu Wang
Jie Li
Rui Yu
Yan Teng
Xingjun Ma
Yingchun Wang
AAML
70
0
0
26 May 2025
Learning a Pessimistic Reward Model in RLHF
Yinglun Xu
Hangoo Kang
Tarun Suresh
Yuxuan Wan
Gagandeep Singh
OffRL
66
0
0
26 May 2025
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
Sangyeop Kim
Yohan Lee
Yongwoo Song
Kimin Lee
AAML
34
0
0
26 May 2025
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Sahana Ramnath
Anurag Mudgil
Brihi Joshi
Skyler Hallinan
Xiang Ren
52
0
0
26 May 2025
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Rihui Xin
Han Liu
Zecheng Wang
Yupeng Zhang
Dianbo Sui
Xiaolin Hu
Bingning Wang
SyDa
73
1
0
26 May 2025
SCAR: Shapley Credit Assignment for More Efficient RLHF
Meng Cao
Shuyuan Zhang
Xiao-Wen Chang
Doina Precup
119
0
0
26 May 2025
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Tao Wang
Ruipeng Zhang
Sicun Gao
OffRL
53
0
0
25 May 2025
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu
Fan Wu
Haotian Ye
David A. Forsyth
James Y. Zou
Nan Jiang
Jiaqi W. Ma
Han Zhao
OffRL
79
0
0
25 May 2025
Incentivizing High-Quality Human Annotations with Golden Questions
Shang Liu
Zhongze Cai
Hanzhao Wang
Zhongyao Ma
Xiaocheng Li
82
0
0
25 May 2025
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang
Yuxuan Chen
Min-Hsuan Yeh
Yixuan Li
LLMAG
AI4CE
64
0
0
25 May 2025
The Price of Format: Diversity Collapse in LLMs
Longfei Yun
Chenyang An
Zilong Wang
Letian Peng
Jingbo Shang
47
0
0
25 May 2025
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang
Suyuchen Wang
Yun Zhu
Bang Liu
ReLM
LRM
123
0
0
25 May 2025
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Harethah Shairah
Hasan Hammoud
Bernard Ghanem
G. Turkiyyah
63
0
0
25 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
173
0
0
25 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
274
0
0
25 May 2025
When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas
Steffen Backmann
David Guzman Piedrahita
Emanuel Tewolde
Rada Mihalcea
Bernhard Schölkopf
Zhijing Jin
90
0
0
25 May 2025
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Xiaoqiang Lin
Arun Verma
Zhongxiang Dai
Daniela Rus
See-Kiong Ng
Bryan Kian Hsiang Low
275
0
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
195
2
0
25 May 2025
Mitigating Deceptive Alignment via Self-Monitoring
Jiaming Ji
Wenqi Chen
Kaile Wang
Donghai Hong
Sitong Fang
...
Jiayi Zhou
Juntao Dai
Sirui Han
Yike Guo
Yaodong Yang
LRM
57
2
0
24 May 2025
Synthesizing and Adapting Error Correction Data for Mobile Large Language Model Applications
Yanxiang Zhang
Zheng Xu
Shanshan Wu
Yuanbo Zhang
Daniel Ramage
KELM
46
0
0
24 May 2025
Unraveling Misinformation Propagation in LLM Reasoning
Yiyang Feng
Yichen Wang
Shaobo Cui
Boi Faltings
Mina Lee
Jiawei Zhou
LRM
90
0
0
24 May 2025
MOSLIM:Align with diverse preferences in prompts through reward classification
Yu Zhang
Wanli Jiang
Zhengyu Yang
25
1
0
24 May 2025
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li
Guangda Huzhang
Haibo Zhang
Xiting Wang
Anxiang Zeng
42
0
0
24 May 2025
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Ruichen Zhang
Rana Muhammad Shahroz Khan
Zhen Tan
Dawei Li
Song Wang
Tianlong Chen
LRM
63
0
0
24 May 2025
AI-Driven Climate Policy Scenario Generation for Sub-Saharan Africa
Rafiu Adekoya Badekale
Adewale Akinfaderin
46
0
0
24 May 2025
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
Guanxing Lu
Wenkai Guo
Chubin Zhang
Yuheng Zhou
Haonan Jiang
Zifeng Gao
Yansong Tang
Ziwei Wang
OffRL
118
0
0
24 May 2025
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Liting Huang
Imran Razzak
Preslav Nakov
Usman Naseem
24
0
0
24 May 2025
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
29
0
0
24 May 2025
Using Large Language Models to Tackle Fundamental Challenges in Graph Learning: A Comprehensive Survey
Mengran Li
Pengyu Zhang
Wenbin Xing
Yijia Zheng
Klim Zaporojets
...
Jia Hu
Xiaolei Ma
Zhiyuan Liu
Paul Groth
Marcel Worring
AI4CE
151
0
0
24 May 2025
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
Mengqi Liao
Xiangyu Xi
Ruinian Chen
Jia Leng
Yangen Hu
Ke Zeng
Shuai Liu
Huaiyu Wan
LRM
53
0
0
24 May 2025
Benchmarking and Rethinking Knowledge Editing for Large Language Models
Guoxiu He
Xin Song
Futing Wang
Aixin Sun
KELM
48
0
0
24 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
152
0
0
24 May 2025
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue
Bowen Jin
Huimin Zeng
Honglei Zhuang
Zhen Qin
Jinsung Yoon
Lanyu Shang
Jiawei Han
Dong Wang
OffRL
BDL
LRM
70
0
0
24 May 2025
PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
Xiaoyan Hu
Lauren Pick
Ho-fung Leung
Farzan Farnia
40
1
0
24 May 2025
Safety Alignment via Constrained Knowledge Unlearning
Zesheng Shi
Yucheng Zhou
Jing Li
MU
KELM
AAML
84
2
0
24 May 2025
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
Haoyuan Sun
Jiaqi Wu
Bo Xia
Yifu Luo
Yifei Zhao
Kai Qin
Xufei Lv
Tiantian Zhang
Yongzhe Chang
Xueqian Wang
OffRL
LRM
209
0
0
24 May 2025
Previous
1
2
3
...
8
9
10
...
126
127
128
Next