Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02155
Cited By
Training language models to follow instructions with human feedback
4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Training language models to follow instructions with human feedback"
50 / 6,392 papers shown
Title
Contrastive Instruction Tuning
Tianyi Yan
Fei Wang
James Y. Huang
Wenxuan Zhou
Fan Yin
Aram Galstyan
Wenpeng Yin
Muhao Chen
ALM
69
6
0
17 Feb 2024
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering
Haoyu Wang
Ruirui Li
Haoming Jiang
Jinjin Tian
Zhengyang Wang
Chen Luo
Xianfeng Tang
Monica Cheng
Tuo Zhao
Jing Gao
RALM
KELM
88
21
0
16 Feb 2024
Whose Emotions and Moral Sentiments Do Language Models Reflect?
Zihao He
Siyi Guo
Ashwin Rao
Kristina Lerman
86
13
0
16 Feb 2024
Proving membership in LLM pretraining data via data watermarks
Johnny Tian-Zheng Wei
Ryan Yixiang Wang
Robin Jia
WaLM
123
30
0
16 Feb 2024
Multi-modal preference alignment remedies regression of visual instruction tuning on language model
Shengzhi Li
Rongyu Lin
Shichao Pei
132
23
0
16 Feb 2024
Universal Prompt Optimizer for Safe Text-to-Image Generation
Zongyu Wu
Hongcheng Gao
Yueze Wang
Xiang Zhang
Suhang Wang
EGVM
82
11
0
16 Feb 2024
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Junjie Ye
Sixian Li
Guanyu Li
Caishuang Huang
Songyang Gao
Yilong Wu
Qi Zhang
Tao Gui
Xuanjing Huang
LLMAG
157
28
0
16 Feb 2024
Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning
Yinpeng Liu
Jiawei Liu
Xiang Shi
Qikai Cheng
Yong Huang
Wei Lu
100
29
0
16 Feb 2024
Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm
Yuanzhen Xie
Xinzhou Jin
Tao Xie
Mingxiong Lin
Liang Chen
Chenyun Yu
Lei Cheng
Chengxiang Zhuo
Bo Hu
Zang Li
106
21
0
16 Feb 2024
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Yuxuan Kuang
Hai Lin
Meng Jiang
LM&Ro
105
34
0
16 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
208
113
0
16 Feb 2024
AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation
Zhaowei Wang
Wei Fan
Qing Zong
Hongming Zhang
Sehyun Choi
Tianqing Fang
Xin Liu
Yangqiu Song
Ginny Wong
Simon See
96
14
0
16 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
150
21
0
16 Feb 2024
Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
Zae Myung Kim
Kwang Hee Lee
Preston Zhu
Vipul Raheja
Dongyeop Kang
DeLMO
102
10
0
16 Feb 2024
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
133
67
0
16 Feb 2024
Properties and Challenges of LLM-Generated Explanations
Jenny Kunz
Marco Kuhlmann
94
24
0
16 Feb 2024
I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models
Wenchao Dong
Assem Zhunis
Hyojin Chin
Jiyoung Han
Meeyoung Cha
46
2
0
16 Feb 2024
DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection
Herun Wan
Shangbin Feng
Zhaoxuan Tan
Heng Wang
Yulia Tsvetkov
Minnan Luo
139
34
0
16 Feb 2024
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Jiaheng Wei
Yuanshun Yao
Jean-François Ton
Hongyi Guo
Andrew Estornell
Yang Liu
HILM
137
26
0
16 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Ajay Patel
Colin Raffel
Chris Callison-Burch
SyDa
AI4CE
77
27
0
16 Feb 2024
Can We Verify Step by Step for Incorrect Answer Detection?
Xin Xu
Shizhe Diao
Can Yang
Yang Wang
LRM
337
15
0
16 Feb 2024
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
160
22
0
16 Feb 2024
Can we Soft Prompt LLMs for Graph Learning Tasks?
Zheyuan Liu
Xiaoxin He
Yijun Tian
Nitesh Chawla
98
33
0
15 Feb 2024
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback
Henry W Sprueill
Carl Edwards
Khushbu Agarwal
Mariefel V. Olarte
Udishnu Sanyal
Conrad Johnston
Hongbin Liu
Heng Ji
Sutanay Choudhury
LRM
120
9
0
15 Feb 2024
Recovering the Pre-Fine-Tuning Weights of Generative Models
Eliahu Horwitz
Jonathan Kahana
Yedid Hoshen
81
12
0
15 Feb 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
227
83
0
15 Feb 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
James Liu
Guangxuan Xiao
Kai Li
Jason D. Lee
Song Han
Tri Dao
Tianle Cai
75
26
0
15 Feb 2024
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu
Fanzhi Zeng
Jiaming Ji
Dong Yan
Kaile Wang
Jiayi Zhou
Yang Han
Josef Dai
Xuehai Pan
Yaodong Yang
AI4CE
139
5
0
15 Feb 2024
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation
Yaoxiang Wang
Zhiyong Wu
Junfeng Yao
Jinsong Su
LLMAG
154
12
0
15 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Dinesh Manocha
163
59
0
15 Feb 2024
Towards Safer Large Language Models through Machine Unlearning
Zheyuan Liu
Guangyao Dou
Zhaoxuan Tan
Yijun Tian
Meng Jiang
KELM
MU
121
87
0
15 Feb 2024
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Saeed Khaki
JinJin Li
Lan Ma
Liu Yang
Prathap Ramachandra
89
24
0
15 Feb 2024
Self-Augmented In-Context Learning for Unsupervised Word Translation
Yaoyiran Li
Anna Korhonen
Ivan Vulić
69
5
0
15 Feb 2024
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
Jinyuan Li
Han Li
Di Sun
Jiahao Wang
Wenkun Zhang
Zan Wang
Gang Pan
103
7
0
15 Feb 2024
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Quentin Gallouedec
E. Beeching
Clément Romac
Emmanuel Dellandrea
56
11
0
15 Feb 2024
Aligning Crowd Feedback via Distributional Preference Reward Modeling
Dexun Li
Cong Zhang
Kuicai Dong
Derrick-Goh-Xin Deik
Ruiming Tang
Yong Liu
97
17
0
15 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
84
35
0
15 Feb 2024
Answer is All You Need: Instruction-following Text Embedding via Answering the Question
Letian Peng
Yuwei Zhang
Zilong Wang
Jayanth Srinivasa
Gaowen Liu
Zihan Wang
Jingbo Shang
91
11
0
15 Feb 2024
Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering
Jiaxiang Liu
Tong Zhou
Yubo Chen
Kang Liu
Jun Zhao
KELM
135
3
0
15 Feb 2024
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking
Yi R. Fung
Ruining Zhao
Jae Doo
Chenkai Sun
Heng Ji
73
33
0
14 Feb 2024
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Yuchun Miao
Sen Zhang
Liang Ding
Rong Bao
Lefei Zhang
Dacheng Tao
96
21
0
14 Feb 2024
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Feifan Song
Yuxuan Fan
Xin Zhang
Peiyi Wang
Houfeng Wang
66
9
0
14 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
140
68
0
14 Feb 2024
Personalized Large Language Models
Stanislaw Wo'zniak
Bartlomiej Koptyra
Arkadiusz Janz
P. Kazienko
Jan Kocoñ
112
21
0
14 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Lifeng Jin
Linfeng Song
Haitao Mi
Helen Meng
HILM
91
52
0
14 Feb 2024
Instruction Tuning for Secure Code Generation
Jingxuan He
Mark Vero
Gabriela Krasnopolska
Martin Vechev
94
24
0
14 Feb 2024
Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models
Goutham Rajendran
Simon Buchholz
Bryon Aragam
Bernhard Schölkopf
Pradeep Ravikumar
AI4CE
175
23
0
14 Feb 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu
Tian-Xin Li
Quanfeng Lu
Wenqi Shao
Junjun He
Yu Qiao
Ping Luo
ELM
LM&MA
87
67
0
14 Feb 2024
Instruction Backdoor Attacks Against Customized LLMs
Rui Zhang
Hongwei Li
Rui Wen
Wenbo Jiang
Yuan Zhang
Michael Backes
Yun Shen
Yang Zhang
AAML
SILM
106
32
0
14 Feb 2024
Previous
1
2
3
...
97
98
99
...
126
127
128
Next