Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,395 papers shown
Title
Trustworthy AI: Safety, Bias, and Privacy -- A Survey
Xingli Fang
Jianwei Li
Varun Mulchandani
Jung-Eun Kim
93
0
0
11 Feb 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
Xu Huang
Wenhao Zhu
Hanxu Hu
Zeang Sheng
Lei Li
Shujian Huang
Fei Yuan
ELM
192
4
0
11 Feb 2025
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation
Shenyi Zhang
Yuchen Zhai
Keyan Guo
Hongxin Hu
Shengnan Guo
Zheng Fang
Lingchen Zhao
Chao Shen
Cong Wang
Qian Wang
AAML
153
4
0
11 Feb 2025
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Hai-Tao Zheng
Haojing Huang
Jiayi Kuang
Yangning Li
Shu Guo
Chao Qu
Jue Chen
Hai-Tao Zheng
Ying Shen
Philip S. Yu
CLL
111
5
0
11 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Tengjiao Wang
Mengdi Wang
ReLM
LRM
AI4CE
194
18
0
10 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
241
0
0
10 Feb 2025
InSTA: Towards Internet-Scale Training For Agents
Brandon Trabucco
Gunnar Sigurdsson
Robinson Piramuthu
Ruslan Salakhutdinov
ALM
208
4
0
10 Feb 2025
C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation
Guoxin Chen
Minpeng Liao
Peiying Yu
Dingmin Wang
Zile Qiao
Chao Yang
Xin Zhao
Kai Fan
108
1
0
10 Feb 2025
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
Yuxing Lu
Jinzhuo Wang
79
2
0
10 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
172
2
0
10 Feb 2025
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
116
1
0
10 Feb 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
198
0
0
09 Feb 2025
Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo
Aryo Pradipta Gema
Xuanli He
Emile van Krieken
Pietro Lesci
Pasquale Minervini
LLMAG
156
2
0
09 Feb 2025
Towards a Sharp Analysis of Offline Policy Learning for
f
f
f
-Divergence-Regularized Contextual Bandits
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
113
0
0
09 Feb 2025
Delta - Contrastive Decoding Mitigates Text Hallucinations in Large Language Models
Cheng Peng Huang
Hao-Yuan Chen
HILM
152
1
0
09 Feb 2025
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu
Ricardo Henao
170
0
0
09 Feb 2025
Few-shot LLM Synthetic Data with Distribution Matching
Jiyuan Ren
Zhaocheng Du
Zhihao Wen
Qinglin Jia
Sunhao Dai
Chuhan Wu
Zhenhua Dong
SyDa
225
0
0
09 Feb 2025
Dual Caption Preference Optimization for Diffusion Models
Amir Saeidi
Yiran Luo
Agneet Chatterjee
Shamanthak Hegde
Bimsara Pathiraja
Yezhou Yang
Chitta Baral
DiffM
111
0
0
09 Feb 2025
Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails
Yijun Yang
L. Wang
Xiao Yang
Lanqing Hong
Jun Zhu
AAML
77
0
0
09 Feb 2025
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Wanqi Yang
Yongqian Li
Meng Fang
Lawrence Yunliang Chen
160
1
0
09 Feb 2025
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen
Shuai Tian
Shugao Liu
Yingting Zhou
Haoran Li
Dongbin Zhao
OffRL
233
13
0
08 Feb 2025
Design Considerations in Offline Preference-based RL
Alekh Agarwal
Christoph Dann
T. V. Marinov
OffRL
110
1
0
08 Feb 2025
DeepThink: Aligning Language Models with Domain-Specific User Intents
Yang Li
Mingxuan Luo
Yeyun Gong
Chen Lin
Jian Jiao
Yi Liu
Kaili Huang
LRM
ALM
ELM
140
0
0
08 Feb 2025
Mol-MoE: Training Preference-Guided Routers for Molecule Generation
Diego Calanzone
P. DÓro
Pierre-Luc Bacon
106
1
0
08 Feb 2025
Language Models Largely Exhibit Human-like Constituent Ordering Preferences
Ada Defne Tur
Gaurav Kamath
Siva Reddy
168
0
0
08 Feb 2025
Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions
Jingxin Xu
Guoshun Nan
Sheng Guan
Sicong Leng
Yang Liu
Zixiao Wang
Yuyang Ma
Zhili Zhou
Yanzhao Hou
Xiaofeng Tao
LM&MA
123
0
0
08 Feb 2025
Extracting and Understanding the Superficial Knowledge in Alignment
Runjin Chen
Gabriel Jacob Perin
Xuxi Chen
Xilun Chen
Y. Han
Nina S. T. Hirata
Junyuan Hong
B. Kailkhura
81
1
0
07 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
153
5
0
07 Feb 2025
Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics
Hussam Ghanem
C. Cruz
139
0
0
07 Feb 2025
PsyPlay: Personality-Infused Role-Playing Conversational Agents
Tao Yang
Yuhua Zhu
Xiaojun Quan
Cong Liu
Qifan Wang
188
1
0
06 Feb 2025
MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers
Nicole Cho
William Watson
AAML
HILM
294
0
0
06 Feb 2025
Safety Reasoning with Guidelines
Haoyu Wang
Zeyu Qin
Li Shen
Xueqian Wang
Minhao Cheng
Dacheng Tao
191
4
0
06 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
226
7
0
06 Feb 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
Xingye Chen
Wei Feng
Zhenbang Du
Weizhen Wang
Yuxiao Chen
...
Jingping Shao
Yuanjie Shao
Xinge You
Changxin Gao
Nong Sang
OffRL
126
2
0
05 Feb 2025
Aero-LLM: A Distributed Framework for Secure UAV Communication and Intelligent Decision-Making
Balakrishnan Dharmalingam
Rajdeep Mukherjee
Brett Piggott
Guohuan Feng
Anyi Liu
75
1
0
05 Feb 2025
Learning from Active Human Involvement through Proxy Value Propagation
Zhenghao Peng
Wenjie Mo
Chenda Duan
Quanyi Li
Bolei Zhou
193
16
0
05 Feb 2025
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas
Muneeza Azmat
R. Horesh
Mikhail Yurochkin
187
2
0
05 Feb 2025
Prompt-based Depth Pruning of Large Language Models
Juyun Wee
Minjae Park
Jaeho Lee
VLM
199
0
0
04 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LRM
LLMAG
571
18
0
04 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAML
ELM
148
9
0
04 Feb 2025
IPO: Iterative Preference Optimization for Text-to-Video Generation
Xiaomeng Yang
Zhiyu Tan
Xuecheng Nie
VGen
177
3
0
04 Feb 2025
Why human-AI relationships need socioaffective alignment
Hannah Rose Kirk
Iason Gabriel
Chris Summerfield
Bertie Vidgen
Scott A. Hale
104
10
0
04 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
193
5
0
04 Feb 2025
STAIR: Improving Safety Alignment with Introspective Reasoning
Yuanhang Zhang
Siyuan Zhang
Yao Huang
Zeyu Xia
Zhengwei Fang
Xiao Yang
Ranjie Duan
Dong Yan
Yinpeng Dong
Jun Zhu
LRM
LLMSV
179
7
0
04 Feb 2025
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models
Haoran Ye
Tianze Zhang
Yuhang Xie
Liyuan Zhang
Yuanyi Ren
Xin Zhang
Guojie Song
PILM
199
0
0
04 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
154
1
0
03 Feb 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh
Dripta S. Raychaudhuri
Jiachen Li
Konstantinos Karydis
Amit K. Roy-Chowdhury
VLM
131
0
0
03 Feb 2025
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang
En Yu
Yi Shao
Shuai Li
181
0
0
03 Feb 2025
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Ziyi Wang
Hanbin Wang
Wendi Li
...
Yu Cheng
Zhiyuan Liu
Maosong Sun
Bowen Zhou
Ning Ding
OffRL
LRM
199
103
0
03 Feb 2025
Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization
Minttu Alakuijala
Ya Gao
Georgy Ananov
Samuel Kaski
Pekka Marttinen
Alexander Ilin
Harri Valpola
LLMAG
CLL
144
2
0
03 Feb 2025
Previous
1
2
3
...
30
31
32
...
126
127
128
Next