Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,370 papers shown
Title
Being Strong Progressively! Enhancing Knowledge Distillation of Large Language Models through a Curriculum Learning Framework
Lingyuan Liu
Mengxiang Zhang
52
0
0
06 Jun 2025
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley
Edward W. Staley
Joshua Carney
William Redman
Marie Chau
Nathan G. Drenkow
AAML
PILM
23
0
0
06 Jun 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRL
ALM
69
0
0
06 Jun 2025
Saffron-1: Safety Inference Scaling
Ruizhong Qiu
Gaotang Li
Tianxin Wei
Jingrui He
Hanghang Tong
LRM
34
0
0
06 Jun 2025
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
60
0
0
06 Jun 2025
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
Adarsh Prasad Behera
J. Champati
Roberto Morabito
Sasu Tarkoma
J. Gross
25
0
0
06 Jun 2025
WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management
Bowen Yuan
Selena Song
Javier Fernandez
Yadan Luo
Mahsa Baktashmotlagh
Zijian Wang
53
0
0
06 Jun 2025
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Yichi Zhang
Xin Luna Dong
Zhaojiang Lin
Andrea Madotto
Anuj Kumar
Babak Damavandi
J. Chai
Seungwhan Moon
68
0
0
06 Jun 2025
Debiasing Online Preference Learning via Preference Feature Preservation
Dongyoung Kim
Jinsung Yoon
Jinwoo Shin
Jaehyung Kim
17
0
0
06 Jun 2025
Cross-lingual Collapse: How Language-Centric Foundation Models Shape Reasoning in Large Language Models
Cheonbok Park
Jeonghoon Kim
J. H. Lee
Sanghwan Bae
Jaegul Choo
Kang Min Yoo
LRM
62
0
0
06 Jun 2025
Distillation Robustifies Unlearning
Bruce W. Lee
Addie Foote
Alex Infanger
Leni Shor
Harish Kamath
Jacob Goldman-Wetzler
Bryce Woodworth
Alex Cloud
Alexander Matt Turner
MU
75
0
0
06 Jun 2025
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Weixun Wang
Shaopan Xiong
Gengru Chen
Wei Gao
Sheng Guo
...
Lin Qu
Wenbo Su
Wei Wang
Jiamang Wang
Bo Zheng
OffRL
68
0
0
06 Jun 2025
Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
40
0
0
05 Jun 2025
A Smooth Sea Never Made a Skilled
SAILOR
\texttt{SAILOR}
SAILOR
: Robust Imitation via Learning to Search
A. Jain
Vibhakar Mohta
Subin Kim
Atiksh Bhardwaj
Juntao Ren
Yunhai Feng
Sanjiban Choudhury
Gokul Swamy
OffRL
124
0
0
05 Jun 2025
Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering
Yi Ji
Runzhi Li
Baolei Mao
AAML
22
0
0
05 Jun 2025
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
Lingyuan Liu
Mengxiang Zhang
169
0
0
05 Jun 2025
Improving Low-Resource Morphological Inflection via Self-Supervised Objectives
Adam Wiemerslage
Katharina von der Wense
107
0
0
05 Jun 2025
TreeRPO: Tree Relative Policy Optimization
Zhicheng YANG
Zhijiang Guo
Yinya Huang
Xiaodan Liang
Yiwei Wang
Jing Tang
LRM
91
0
0
05 Jun 2025
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
Lei Hsiung
Tianyu Pang
Yung-Chen Tang
Linyue Song
Tsung-Yi Ho
Pin-Yu Chen
Yaoqing Yang
123
0
0
05 Jun 2025
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Anirudh Bharadwaj
Chaitanya Malaviya
Nitish Joshi
Mark Yatskar
132
0
0
05 Jun 2025
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Tennison Liu
M. Schaar
AIFin
LRM
128
0
0
05 Jun 2025
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
Tianjiao Li
Mengran Yu
Chenyu Shi
Yanjun Zhao
Xiaojing Liu
Qiang Zhang
Qi Zhang
Xuanjing Huang
Jiayin Wang
100
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
92
0
0
05 Jun 2025
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
Tanmay Parekh
Kartik Mehta
Ninareh Mehrabi
Kai-Wei Chang
Nanyun Peng
94
0
0
05 Jun 2025
SECNEURON: Reliable and Flexible Abuse Control in Local LLMs via Hybrid Neuron Encryption
Zhiqiang Wang
Haohua Du
Junyang Wang
Haifeng Sun
Kaiwen Guo
Haikuo Yu
Chao Liu
Xiang-Yang Li
AAML
139
0
0
05 Jun 2025
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou
Keren Ye
M. Delbracio
P. Milanfar
Vishal M. Patel
Hossein Talebi
43
0
0
05 Jun 2025
On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models
Xingwu Chen
Tianle Li
Difan Zou
LRM
107
0
0
05 Jun 2025
PPO in the Fisher-Rao geometry
Razvan-Andrei Lascu
David Siska
Łukasz Szpruch
51
0
0
04 Jun 2025
Exchange of Perspective Prompting Enhances Reasoning in Large Language Models
Lin Sun
Can Zhang
LRM
63
0
0
04 Jun 2025
Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks
Lin Mu
Guowei Chu
Li Ni
Lei Sang
Zhize Wu
Peiquan Jin
Yiwen Zhang
97
0
0
04 Jun 2025
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRL
MoE
VLM
LRM
89
0
0
04 Jun 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
Zhaoxuan Tan
Zheng Li
Tianyi Liu
Haodong Wang
Hyokun Yun
...
Yifan Gao
Ruijie Wang
Priyanka Nigam
Bing Yin
Meng Jiang
77
0
0
04 Jun 2025
Misalignment or misuse? The AGI alignment tradeoff
Max Hellrigel-Holderbaum
Leonard Dung
75
0
0
04 Jun 2025
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models
Y. Wu
Yushi Bai
Zhiqiang Hu
Juanzi Li
Roy Ka-wei Lee
66
0
0
04 Jun 2025
EpiCoDe: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding
Mingxu Tao
Jie Hu
Mingchuan Yang
Yunhuai Liu
Dongyan Zhao
Yansong Feng
88
0
0
04 Jun 2025
Leveraging Reward Models for Guiding Code Review Comment Generation
Oussama Ben Sghaier
Rosalia Tufano
Gabriele Bavota
Houari Sahraoui
21
0
0
04 Jun 2025
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales
Ayuto Tsutsumi
Yuu Jinnai
74
0
0
04 Jun 2025
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Ziyi Wu
Anil Kag
Ivan Skorokhodov
Willi Menapace
Ashkan Mirzaei
Igor Gilitschenski
Sergey Tulyakov
Aliaksandr Siarohin
DiffM
VGen
71
0
0
04 Jun 2025
SAGE:Specification-Aware Grammar Extraction for Automated Test Case Generation with LLMs
Aditi
Hyunwoo Park
Sicheol Sung
Yo-Sub Han
Sang-Ki Ko
17
0
0
04 Jun 2025
RewardAnything: Generalizable Principle-Following Reward Models
Zhuohao Yu
Jiali Zeng
Weizheng Gu
Yidong Wang
Jindong Wang
Fandong Meng
Jie Zhou
Yue Zhang
Shikun Zhang
Wei Ye
LRM
119
1
0
04 Jun 2025
Robust Preference Optimization via Dynamic Target Margins
Jie Sun
Junkang Wu
Jiancan Wu
Zhibo Zhu
Xingyu Lu
Jun Zhou
Lintao Ma
Xiang Wang
61
0
0
04 Jun 2025
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang
Yu Xia
Hai-Long Sun
Shiyin Lu
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
LMTD
LRM
100
0
0
04 Jun 2025
LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward
Yi Zhao
Siqi Wang
Jing Li
58
0
0
04 Jun 2025
Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos
Sulyab Thottungal Valapu
Linus Lei
J. Coleman
Bhaskar Krishnamachari
ALM
94
0
0
04 Jun 2025
Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Chaehun Shin
Jooyoung Choi
Johan Barthelemy
Jungbeom Lee
Sungroh Yoon
DiffM
87
0
0
04 Jun 2025
Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising
Zhenhui Liu
Chunyuan Yuan
Ming Pang
Zheng Fang
Li Yuan
Xue Jiang
Changping Peng
Zhangang Lin
Zheng Luo
Jingping Shao
82
0
0
04 Jun 2025
Minos: A Multimodal Evaluation Model for Bidirectional Generation Between Image and Text
Junzhe Zhang
Huixuan Zhang
Xinyu Hu
Li Lin
Mingqi Gao
Shi Qiu
Xiaojun Wan
MLLM
67
0
0
03 Jun 2025
EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing
Fan Gao
Dongyuan Li
Ding Xia
Fei Mi
Yasheng Wang
Lifeng Shang
Baojun Wang
ELM
42
0
0
03 Jun 2025
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching
Liang Yue
Yihong Tang
Kehai Chen
Jie Liu
Min Zhang
LLMAG
65
0
0
03 Jun 2025
ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment
Martin JJ. Bucher
Iro Armeni
DiffM
67
0
0
03 Jun 2025
Previous
1
2
3
4
5
...
126
127
128
Next