Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,370 papers shown
Title
GFRIEND: Generative Few-shot Reward Inference through EfficieNt DPO
Yiyang Zhao
Huiyu Bai
Xuejiao Zhao
OffRL
31
0
0
10 Jun 2025
Intra-Trajectory Consistency for Reward Modeling
Chaoyang Zhou
Shunyu Liu
Zengmao Wang
Di Wang
Rong-Cheng Tu
Bo Du
Dacheng Tao
48
0
0
10 Jun 2025
Olica: Efficient Structured Pruning of Large Language Models without Retraining
Jiujun He
Huazhen Lin
26
0
0
10 Jun 2025
Learning to Reason Across Parallel Samples for LLM Reasoning
Jianing Qi
Xi Ye
Hao Tang
Zhigang Zhu
Eunsol Choi
ReLM
LRM
24
0
0
10 Jun 2025
ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
Hee Suk Yoon
Eunseop Yoon
Mark Hasegawa-Johnson
Sungwoong Kim
Chang D. Yoo
32
0
0
10 Jun 2025
Intention-Conditioned Flow Occupancy Models
Chongyi Zheng
S. Park
Sergey Levine
Benjamin Eysenbach
AI4TS
OffRL
AI4CE
46
0
0
10 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
32
0
0
10 Jun 2025
ORFS-agent: Tool-Using Agents for Chip Design Optimization
Amur Ghose
Andrew B. Kahng
Sayak Kundu
Zhiang Wang
AI4CE
23
0
0
10 Jun 2025
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner
Lei Zhang
J. Yang
Min Yang
Jian Yang
Mouxiang Chen
Jiajun Zhang
Zeyu Cui
Binyuan Hui
Junyang Lin
49
0
0
10 Jun 2025
Evaluating LLMs Across Multi-Cognitive Levels: From Medical Knowledge Mastery to Scenario-Based Problem Solving
Yuxuan Zhou
Xien Liu
Chenwei Yan
Chen Ning
X. Zhang
...
Xiangling Fu
Shijin Wang
Guoping Hu
Yu Wang
Ji Wu
ELM
37
0
0
10 Jun 2025
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Phuc Minh Nguyen
Ngoc-Hieu Nguyen
Duy Nguyen
Anji Liu
An Mai
Binh T. Nguyen
Daniel Sonntag
Khoa D. Doan
31
0
0
10 Jun 2025
AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin
Shuo Yang
Qihui Zhang
Yuyang Liu
Yue Huang
Xiaojun Jia
...
Jiayu Yao
Jigang Wang
Hailiang Dai
Yibing Song
Li Yuan
48
0
0
10 Jun 2025
Reinforcement Learning via Implicit Imitation Guidance
Perry Dong
Alec M. Lessing
Annie S. Chen
Chelsea Finn
OffRL
27
0
0
09 Jun 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team
Chaojun Xiao
Yuxuan Li
Xu Han
Yuzhuo Bai
...
Zhiyuan Liu
Guoyang Zeng
Chao Jia
Dahai Li
Maosong Sun
MLLM
36
0
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAML
LRM
30
0
0
09 Jun 2025
Improving Fairness of Large Language Models in Multi-document Summarization
Haoyuan Li Yusen Zhang
Snigdha Chaturvedi
Snigdha Chaturvedi
19
0
0
09 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
34
0
0
09 Jun 2025
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Feifan Song
Shaohang Wei
Wen Luo
Yuxuan Fan
Tianyu Liu
Guoyin Wang
Houfeng Wang
19
0
0
09 Jun 2025
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
Wenlong Meng
Shuguo Fan
Chengkun Wei
Min Chen
Yuwei Li
Yuanchao Zhang
Zhikun Zhang
Wenzhi Chen
17
0
0
09 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
31
0
0
09 Jun 2025
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
Jijie Li
Li Du
hanyu Zhao
Bo Zhang
Liangdong Wang
Boyan Gao
Guang Liu
Yonghua Lin
ALM
SyDa
27
0
0
09 Jun 2025
Gradients: When Markets Meet Fine-tuning -- A Distributed Approach to Model Optimisation
Christopher Subia-Waud
31
0
0
09 Jun 2025
Improving Large Language Models with Concept-Aware Fine-Tuning
Michael K. Chen
Xikun Zhang
Jiaxing Huang
Dacheng Tao
21
0
0
09 Jun 2025
Training Superior Sparse Autoencoders for Instruct Models
Jiaming Li
Haoran Ye
Yukun Chen
Xinyue Li
Lei Zhang
Hamid Alinejad-Rokny
Jimmy Chih-Hsien Peng
Min Yang
SyDa
25
0
0
09 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
17
0
0
09 Jun 2025
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
Yaswanth Chittepu
Blossom Metevier
Will Schwarzer
Austin Hoag
S. Niekum
Philip S Thomas
25
0
0
09 Jun 2025
Synthesis by Design: Controlled Data Generation via Structural Guidance
Lei Xu
Sirui Chen
Yuxuan Huang
Chaochao Lu
33
0
0
09 Jun 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
16
0
0
09 Jun 2025
From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp Metaprogramming Loop
Jordi de la Torre
LLMAG
KELM
23
0
0
08 Jun 2025
Robotic Policy Learning via Human-assisted Action Preference Optimization
Wenke Xia
Yichu Yang
Hongtao Wu
Xiao Ma
Tao Kong
Di Hu
35
0
0
08 Jun 2025
Tokenized Bandit for LLM Decoding and Alignment
Suho Shin
Chenghao Yang
Haifeng Xu
Mohammad T. Hajiaghayi
28
0
0
08 Jun 2025
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Leheng Sheng
Changshuo Shen
Weixiang Zhao
Junfeng Fang
Xiaohao Liu
Zhenkai Liang
Xiang Wang
An Zhang
Tat-Seng Chua
LLMSV
36
0
0
08 Jun 2025
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Pengfei Zhao
Rongbo Luan
Wei Zhang
Peng Wu
Sifeng He
25
0
0
08 Jun 2025
AssertBench: A Benchmark for Evaluating Self-Assertion in Large Language Models
Jaeho Lee
Atharv Chowdhary
HILM
38
0
0
08 Jun 2025
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization
Yikun Wang
Yibin Wang
Dianyi Wang
Zimian Peng
Qipeng Guo
Dacheng Tao
Jiaqi Wang
LRM
19
1
0
08 Jun 2025
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang
Le Wu
Kui Yu
Guangyi Lv
Dacao Zhang
AAML
ELM
36
0
0
08 Jun 2025
Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models
Ren-Jian Wang
Ke Xue
Zeyu Qin
Ziniu Li
Sheng Tang
Hao-Tian Li
Shengcai Liu
Chao Qian
AAML
20
0
0
08 Jun 2025
History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM
Andrew Kiruluta
Andreas Lemos
Priscilla Burity
LRM
25
0
0
08 Jun 2025
Mathesis: Towards Formal Theorem Proving from Natural Languages
Yu Xuejun
Jianyuan Zhong
Zijin Feng
Pengyi Zhai
Roozbeh Yousefzadeh
...
Dongcai Lu
Jiacheng Sun
Q. Xu
Shen Xin
Zhenguo Li
AIMat
OffRL
LRM
25
0
0
08 Jun 2025
HauntAttack: When Attack Follows Reasoning as a Shadow
Jingyuan Ma
Rui Li
Zheng Li
Junfeng Liu
Lei Sha
Zhifang Sui
AAML
LRM
18
0
0
08 Jun 2025
AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Zixuan Jiang
Renjing Xu
25
0
0
08 Jun 2025
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth
Yichi Zhang
Jinlong Pang
Zhaowei Zhu
Yang Liu
29
1
0
08 Jun 2025
Less is More: some Computational Principles based on Parcimony, and Limitations of Natural Intelligence
Laura Cohen
Xavier Hinaut
Lilyana Petrova
Alexandre Pitti
Syd Reynal
Ichiro Tsuda
25
0
0
08 Jun 2025
OneSug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion
Xian Guo
Ben Chen
Siyuan Wang
Ying Yang
Chenyi Lei
Yuqing Ding
Han Li
19
0
0
07 Jun 2025
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Kyubyung Chae
Hyunbin Jin
Taesup Kim
27
0
0
07 Jun 2025
Quantile Regression with Large Language Models for Price Prediction
Nikhita Vedula
Dushyanta Dhyani
Laleh Jalali
Boris Oreshkin
Mohsen Bayati
S. Malmasi
22
0
0
07 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Haiyun Jiang
OffRL
LRM
25
0
0
07 Jun 2025
What Makes a Good Natural Language Prompt?
Do Xuan Long
Duy Dinh
Ngoc-Hai Nguyen
Kenji Kawaguchi
Nancy F. Chen
Shafiq Joty
Min-Yen Kan
35
0
0
07 Jun 2025
Benchmarking Misuse Mitigation Against Covert Adversaries
Davis Brown
Mahdi Sabbaghi
Luze Sun
Alexander Robey
George Pappas
Eric Wong
Hamed Hassani
30
0
0
06 Jun 2025
Distillation Robustifies Unlearning
Bruce W. Lee
Addie Foote
Alex Infanger
Leni Shor
Harish Kamath
Jacob Goldman-Wetzler
Bryce Woodworth
Alex Cloud
Alexander Matt Turner
MU
75
0
0
06 Jun 2025
Previous
1
2
3
4
5
6
...
126
127
128
Next