Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,392 papers shown
Title
On Synthetic Data Strategies for Domain-Specific Generative Retrieval
Haoyang Wen
Jiang Guo
Yi Zhang
Jiarong Jiang
Ziyi Wang
SyDa
127
1
0
25 Feb 2025
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training
Yihang Yao
Zhepeng Cen
Miao Li
William Jongwon Han
Yuyou Zhang
Emerson Liu
Zuxin Liu
Chuang Gan
Ding Zhao
ReLM
LRM
152
1
0
25 Feb 2025
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
115
0
0
25 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
77
1
0
25 Feb 2025
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión?
Carlos Gómez-Rodríguez
SyDa
AILaw
ELM
VLM
277
0
0
25 Feb 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
98
1
0
25 Feb 2025
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization
Ru Wang
Wei Huang
Selena Song
Haoyu Zhang
Yusuke Iwasawa
Y. Matsuo
Jiaxian Guo
OODD
LRM
130
3
0
25 Feb 2025
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Xu Chu
Zhixin Zhang
Tianyu Jia
Yujie Jin
145
0
0
25 Feb 2025
Comparing Native and Non-native English Speakers' Behaviors in Collaborative Writing through Visual Analytics
Yuexi Chen
Yimin Xiao
Kazi Tasnim Zinat
Naomi Yamashita
G. Gao
Zhicheng Liu
89
1
0
25 Feb 2025
Language Models' Factuality Depends on the Language of Inquiry
Tushar Aggarwal
Kumar Tanmay
Ayush Agrawal
Kumar Ayush
Hamid Palangi
Paul Pu Liang
HILM
KELM
124
2
0
25 Feb 2025
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers
Zhuocheng Zhang
Yang Feng
Min Zhang
137
1
0
25 Feb 2025
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
Zhewei Kang
Xuandong Zhao
Dawn Song
LRM
125
7
0
25 Feb 2025
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts
Eric Xue
Zeyi Huang
Zeyi Huang
Haohan Wang
Yong Jae Lee
Haohan Wang
156
1
0
25 Feb 2025
What is the Alignment Objective of GRPO?
Milan Vojnovic
Se-Young Yun
138
5
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
217
9
0
24 Feb 2025
Policy Learning with a Natural Language Action Space: A Causal Approach
Bohan Zhang
Yixin Wang
Paramveer S. Dhillon
CML
78
0
0
24 Feb 2025
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models
Chenlu Guo
Nuo Xu
Yi-Ju Chang
Yuan Wu
AI4MH
LM&MA
118
2
0
24 Feb 2025
Large Language Models and Mathematical Reasoning Failures
Johan Boye
Birger Moell
ELM
LRM
85
5
0
24 Feb 2025
UrduLLaMA 1.0: Dataset Curation, Preprocessing, and Evaluation in Low-Resource Settings
Layba Fiaz
Munief Hassan Tahir
Sana Shams
Sarmad Hussain
97
0
0
24 Feb 2025
Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang
Yi Zhou
Danushka Bollegala
97
0
0
24 Feb 2025
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang
Dian Yu
Tao Ge
Linfeng Song
Zhichen Zeng
Haitao Mi
Nan Jiang
Dong Yu
138
4
0
24 Feb 2025
ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Simeng Han
Frank Palma Gomez
Tu Vu
Zefei Li
Daniel Cer
Hansi Zeng
Chris Tar
Arman Cohan
Gustavo Hernández Ábrego
125
3
0
24 Feb 2025
Post-edits Are Preferences Too
Nathaniel Berger
Stefan Riezler
M. Exel
Matthias Huck
133
2
0
24 Feb 2025
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Ruixuan Huang
Xunguang Wang
Zongjie Li
Daoyuan Wu
Shuai Wang
ALM
ELM
142
0
0
24 Feb 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Hao Sun
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
90
1
0
24 Feb 2025
Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction
Michal Bravansky
Vaclav Kubon
Suhas Hariharan
Robert Kirk
136
1
0
24 Feb 2025
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
Wei Zhao
Pengxiang Ding
Hao Fei
Zhefei Gong
Shuanghao Bai
Han Zhao
Donglin Wang
155
11
0
24 Feb 2025
Scale-Free Graph-Language Models
Jianglin Lu
Yixuan Liu
Yitian Zhang
Y. Fu
113
1
0
24 Feb 2025
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
AAML
193
23
0
24 Feb 2025
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
Joseph Suh
Erfan Jahanparast
Suhong Moon
Minwoo Kang
Serina Chang
ALM
LM&MA
149
4
0
24 Feb 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
121
5
0
24 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
126
29
0
24 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
187
2
0
24 Feb 2025
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Boxuan Zhang
Ruqi Zhang
LRM
76
3
0
24 Feb 2025
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li
Chenyang Zhang
Xingwu Chen
Yuan Cao
Difan Zou
131
2
0
24 Feb 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
458
2
0
24 Feb 2025
Aligning Compound AI Systems via System-level DPO
Xiangwen Wang
Yibo Jacky Zhang
Zhoujie Ding
Katherine Tsai
Haolun Wu
Sanmi Koyejo
71
1
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
158
1
0
24 Feb 2025
Streaming Looking Ahead with Token-level Self-reward
Han Zhang
Ruixin Hong
Dong Yu
76
2
0
24 Feb 2025
AlphaAgent: LLM-Driven Alpha Mining with Regularized Exploration to Counteract Alpha Decay
Ziyi Tang
Zhenpeng Chen
Jiarui Yang
Jiayao Mai
Yongsen Zheng
Keze Wang
Jinrui Chen
Liang Lin
AIFin
112
2
0
24 Feb 2025
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo
Jingyang Zhang
Jianyi Zhang
Minxue Tang
Louis DiValentin
...
William Chen
Amin Hass
Tianlong Chen
Yuxiao Chen
Haoyang Li
MU
KELM
125
4
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
204
2
0
24 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
172
2
0
24 Feb 2025
Fully automatic extraction of morphological traits from the Web: utopia or reality?
Diego Marcos
Robert van de Vlasakker
Ioannis Athanasiadis
P. Bonnet
Hervé Goëau
Alexis Joly
W. Daniel Kissling
César Leblanc
André S. J. van Proosdij
Konstantinos P. Panousis
123
3
0
24 Feb 2025
Navigation-GPT: A Robust and Adaptive Framework Utilizing Large Language Models for Navigation Applications
Feng Ma
Xiang Wang
Chen Chen
Xiao-bin Xu
Xin-ping Yan
474
0
0
23 Feb 2025
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System
Saikat Barua
Mostafizur Rahman
Md Jafor Sadek
Rafiul Islam
Shehnaz Khaled
Ahmedul Kabir
LLMAG
173
1
0
23 Feb 2025
Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models
Yuyi Huang
Runzhe Zhan
Derek F. Wong
Lidia S. Chao
Ailin Tao
AAML
SyDa
ELM
77
0
0
23 Feb 2025
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
Heegyu Kim
Taeyang Jeon
Seungtaek Choi
Jihoon Hong
Dongwon Jeon
...
Jisu Bae
Chihoon Lee
Yunseo Kim
Jinsung Park
Hyunsouk Cho
ELM
129
0
1
23 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
150
0
0
23 Feb 2025
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
Avinandan Bose
Laurent Lessard
Maryam Fazel
Krishnamurthy Dvijotham
AAML
71
0
0
23 Feb 2025
Previous
1
2
3
...
27
28
29
...
126
127
128
Next