Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,392 papers shown
Title
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
Avinandan Bose
Laurent Lessard
Maryam Fazel
Krishnamurthy Dvijotham
AAML
71
0
0
23 Feb 2025
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLM
AuLLM
CLIP
VLM
93
1
0
23 Feb 2025
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Max Lamparth
Declan Grabb
Amy Franks
Scott Gershan
Kaitlyn N. Kunstman
...
Monika Drummond Roots
Manu Sharma
Aryan Shrivastava
N. Vasan
Colleen Waickman
144
2
0
22 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
476
1
0
22 Feb 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
277
2
0
22 Feb 2025
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Rui Li
Peiyi Wang
Jingyuan Ma
Di Zhang
Lei Sha
Zhifang Sui
LLMAG
161
0
0
22 Feb 2025
A Generative Approach to LLM Harmfulness Detection with Special Red Flag Tokens
Sophie Xhonneux
David Dobre
Mehrnaz Mohfakhami
Leo Schwinn
Gauthier Gidel
191
2
0
22 Feb 2025
Towards User-level Private Reinforcement Learning with Human Feedback
Jing Zhang
Mingxi Lei
Meng Ding
Mengdi Li
Zihang Xiang
Difei Xu
Jinhui Xu
Di Wang
117
3
0
22 Feb 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
Jinpei Guo
...
Zengmao Wang
Ziqiang Liu
Di Lin
Xianglong Liu
Dacheng Tao
LRM
128
11
0
22 Feb 2025
BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking
Yuxuan Liu
Hongda Sun
Wenya Guo
Xinyan Xiao
Cunli Mao
Zhengtao Yu
Rui Yan
162
3
0
22 Feb 2025
Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes
Maciej Chrabąszcz
Filip Szatkowski
Bartosz Wójcik
Jan Dubiñski
Tomasz Trzciñski
Sebastian Cygert
88
0
0
22 Feb 2025
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
Axel Højmark
Jérémy Scheurer
Marius Hobbhahn
LLMAG
ELM
112
2
0
21 Feb 2025
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
Yintao He
Haiyu Mao
Christina Giannoula
Mohammad Sadrosadati
Juan Gómez Luna
Huawei Li
Xiaowei Li
Ying Wang
O. Mutlu
91
8
0
21 Feb 2025
IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector
Zheng Chen
Yushi Feng
Changyang He
Yue Deng
Hongxi Pu
Yue Liu
DeLMO
91
1
0
21 Feb 2025
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
Yue Zhou
Yi-Ju Chang
Yuan Wu
MoMe
122
3
0
21 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
277
11
0
21 Feb 2025
C3AI: Crafting and Evaluating Constitutions for Constitutional AI
Yara Kyrychenko
Ke Zhou
Edyta Bogucka
Daniele Quercia
ELM
95
5
0
21 Feb 2025
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
Jesus Rios
Pierre Dognin
Ronny Luss
Karthikeyan N. Ramamurthy
211
1
0
21 Feb 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
183
4
0
21 Feb 2025
Standard Benchmarks Fail - Auditing LLM Agents in Finance Must Prioritize Risk
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
163
1
0
21 Feb 2025
Problem-Solving Logic Guided Curriculum In-Context Learning for LLMs Complex Reasoning
Xuetao Ma
Wenbin Jiang
Hua Huang
LRM
217
4
0
21 Feb 2025
WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents
Xinhang Liu
Chi-Keung Tang
Yu-Wing Tai
VGen
218
1
0
21 Feb 2025
MILE: Model-based Intervention Learning
Yigit Korkmaz
Erdem Bıyık
153
2
0
21 Feb 2025
Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting
Gyanendra Shrestha
Chutain Jiang
Sai Akula
Vivek Yannam
Anna Pyayt
Michael Gubanov
LMTD
164
0
0
20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
152
37
0
20 Feb 2025
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
Vaidehi Patil
Elias Stengel-Eskin
Joey Tianyi Zhou
MU
CLL
119
4
0
20 Feb 2025
Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model
Emre Can Acikgoz
Jeremiah Greer
Akul Datta
Ze Yang
William Zeng
Oussama Elachqar
Emmanouil Koukoumidis
Dilek Hakkani-Tur
Gokhan Tur
LLMAG
197
3
0
20 Feb 2025
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Yi Liu
Changran Xu
Yunhao Zhou
Zhiyu Li
Qiang Xu
VLM
134
7
0
20 Feb 2025
Faster WIND: Accelerating Iterative Best-of-
N
N
N
Distillation for LLM Alignment
Tong Yang
Jincheng Mei
H. Dai
Zixin Wen
Shicong Cen
Dale Schuurmans
Yuejie Chi
Bo Dai
122
4
0
20 Feb 2025
Pragmatic Reasoning improves LLM Code Generation
Zhuchen Cao
Sven Apel
Adish Singla
Vera Demberg
LRM
125
0
0
20 Feb 2025
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
Masatoshi Uehara
Xingyu Su
Yulai Zhao
Xiner Li
Aviv Regev
Shuiwang Ji
Sergey Levine
Tommaso Biancalani
131
3
0
20 Feb 2025
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Seonil Son
Ju-Min Oh
Heegon Jin
Cheolhun Jang
Jeongbeom Jeong
Kuntae Kim
157
1
0
20 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
120
3
0
19 Feb 2025
Local Differences, Global Lessons: Insights from Organisation Policies for International Legislation
Lucie-Aimée Kaffee
Pepa Atanasova
Anna Rogers
91
1
0
19 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
97
3
0
19 Feb 2025
Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
Shuo Xing
Yuping Wang
Peiran Li
Ruizheng Bai
Yansen Wang
Chan-wei Hu
Chengxuan Qian
Huaxiu Yao
Zhengzhong Tu
195
8
0
18 Feb 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
136
2
0
18 Feb 2025
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Longxu Dou
Qian Liu
Fan Zhou
Changyu Chen
Zili Wang
...
Tianyu Pang
Chao Du
Xinyi Wan
Wei Lu
Min Lin
251
3
0
18 Feb 2025
Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models
Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi
137
0
0
18 Feb 2025
Computing Voting Rules with Improvement Feedback
Evi Micha
Vasilis Varsamis
89
0
0
18 Feb 2025
EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness
Yunxiao Zhang
Guanming Xiong
Haochen Li
Wen Zhao
LLMAG
106
0
0
18 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
Jiaqi Zhao
Miao Zhang
Ming Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
114
1
0
18 Feb 2025
Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards
Xinyi Yang
Liang Zeng
Heng Dong
Chao Yu
Xiaojun Wu
H. Yang
Yu Wang
Milind Tambe
Tonghan Wang
143
4
0
18 Feb 2025
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu
Yuvan Sharma
Haoru Xue
Giscard Biamby
Junyi Zhang
Ziteng Ji
Trevor Darrell
Roei Herzig
175
2
0
18 Feb 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Junda Zhu
Lingyong Yan
Shuaiqiang Wang
Dawei Yin
Lei Sha
AAML
LRM
107
6
0
18 Feb 2025
Multi-Attribute Steering of Language Models via Targeted Intervention
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
LLMSV
185
2
0
18 Feb 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
33
1
0
18 Feb 2025
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Artyom Kharinaev
Viktor Moskvoretskii
Egor Shvetsov
Kseniia Studenikina
Bykov Mikhail
Evgeny Burnaev
MQ
111
0
0
18 Feb 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
Volkan Cevher
180
1
0
18 Feb 2025
Computational Safety for Generative AI: A Signal Processing Perspective
Pin-Yu Chen
130
1
0
18 Feb 2025
Previous
1
2
3
...
28
29
30
...
126
127
128
Next