ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,395 papers shown
Title
Enhance Reasoning for Large Language Models in the Game Werewolf
Enhance Reasoning for Large Language Models in the Game Werewolf
Shuang Wu
Liwen Zhu
Tao Yang
Shiwei Xu
Qiang Fu
Yang Wei
Haobo Fu
LRMLLMAG
143
24
0
04 Feb 2024
Diversity Measurement and Subset Selection for Instruction Tuning
  Datasets
Diversity Measurement and Subset Selection for Instruction Tuning Datasets
Peiqi Wang
Songlin Yang
Zhen Guo
Matt Stallone
Yoon Kim
Polina Golland
Yikang Shen
85
12
0
04 Feb 2024
Jailbreaking Attack against Multimodal Large Language Model
Jailbreaking Attack against Multimodal Large Language Model
Zhenxing Niu
Haoxuan Ji
Xinbo Gao
Gang Hua
Rong Jin
97
76
0
04 Feb 2024
A Survey on Data Selection for LLM Instruction Tuning
A Survey on Data Selection for LLM Instruction Tuning
Bolin Zhang
Jiahao Wang
Qianlong Du
Jiajun Zhang
Zhiying Tu
Dianhui Chu
106
48
0
04 Feb 2024
Don't Label Twice: Quantity Beats Quality when Comparing Binary
  Classifiers on a Budget
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
Florian E. Dorner
Moritz Hardt
79
4
0
03 Feb 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a
  Multilingual Sentiment Lexicon
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
98
7
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
209
164
0
03 Feb 2024
Panacea: Pareto Alignment via Preference Adaptation for LLMs
Panacea: Pareto Alignment via Preference Adaptation for LLMs
Yifan Zhong
Chengdong Ma
Xiaoyuan Zhang
Ziran Yang
Haojun Chen
Qingfu Zhang
Siyuan Qi
Yaodong Yang
137
38
0
03 Feb 2024
A Survey of Constraint Formulations in Safe Reinforcement Learning
A Survey of Constraint Formulations in Safe Reinforcement Learning
Akifumi Wachi
Xun Shen
Yanan Sui
90
16
0
03 Feb 2024
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks
Gourab Dey
Adithya Ganesan
Yash Kumar Lal
Manal Shah
Shreyashee Sinha
Matthew Matero
Salvatore Giorgi
Vivek Kulkarni
H. Andrew Schwartz
ALM
122
9
0
03 Feb 2024
Bringing Generative AI to Adaptive Learning in Education
Bringing Generative AI to Adaptive Learning in Education
Hang Li
Tianlong Xu
Chaoli Zhang
Eason Chen
Jing Liang
Xing Fan
Haoyang Li
Jiliang Tang
Qingsong Wen
102
25
0
02 Feb 2024
Preference Poisoning Attacks on Reward Model Learning
Preference Poisoning Attacks on Reward Model Learning
Junlin Wu
Jiong Wang
Chaowei Xiao
Chenguang Wang
Ning Zhang
Yevgeniy Vorobeychik
AAML
83
6
0
02 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
  Learning and Large Language Models
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
84
10
0
02 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Ming-Yu Liu
Rafael Valle
Bryan Catanzaro
AuLLMLM&MAMLLM
172
94
0
02 Feb 2024
Homogenization Effects of Large Language Models on Human Creative
  Ideation
Homogenization Effects of Large Language Models on Human Creative Ideation
Barrett R Anderson
Jash Hemant Shah
Max Kreminski
100
90
0
02 Feb 2024
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
Subbarao Kambhampati
Karthik Valmeekam
L. Guan
Mudit Verma
Kaya Stechly
Siddhant Bhambri
Lucas Saldyt
Anil Murthy
LRM
195
126
0
02 Feb 2024
Distilling LLMs' Decomposition Abilities into Compact Language Models
Distilling LLMs' Decomposition Abilities into Compact Language Models
Denis Tarasov
Kumar Shridhar
SyDaOffRLLRM
96
2
0
02 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from
  Compiler Feedback
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Shihan Dou
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Tao Ji
Rui Zheng
Qi Zhang
Xuanjing Huang
Tao Gui
LLMAG
133
45
0
02 Feb 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
Rethinking the Role of Proxy Rewards in Language Model Alignment
Sungdong Kim
Minjoon Seo
SyDaALM
67
2
0
02 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
325
570
0
02 Feb 2024
Efficient Prompt Caching via Embedding Similarity
Efficient Prompt Caching via Embedding Similarity
Hanlin Zhu
Banghua Zhu
Jiantao Jiao
RALM
86
9
0
02 Feb 2024
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language
  Models
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models
Sihao Hu
Tiansheng Huang
Ling Liu
LM&RoLLMAG
76
9
0
02 Feb 2024
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
Mohammadreza Pourreza
Davood Rafiei
78
30
0
02 Feb 2024
Vaccine: Perturbation-aware Alignment for Large Language Model
Vaccine: Perturbation-aware Alignment for Large Language Model
Tiansheng Huang
Sihao Hu
Ling Liu
125
49
0
02 Feb 2024
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and
  Human-Centered Solutions
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
Pouya Pezeshkpour
Eser Kandogan
Nikita Bhutani
Sajjadur Rahman
Tom Mitchell
Estevam R. Hruschka
LLMAGLRM
86
8
0
02 Feb 2024
A Survey for Foundation Models in Autonomous Driving
A Survey for Foundation Models in Autonomous Driving
Haoxiang Gao
Yaqian Li
Kaiwen Long
Ming Yang
Yiqing Shen
VLMLRM
109
32
0
02 Feb 2024
Reading Between the Tweets: Deciphering Ideological Stances of
  Interconnected Mixed-Ideology Communities
Reading Between the Tweets: Deciphering Ideological Stances of Interconnected Mixed-Ideology Communities
Zihao He
Ashwin Rao
Siyi Guo
Negar Mokhberian
Kristina Lerman
57
6
0
02 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELMLM&MA
224
41
0
02 Feb 2024
Sample, estimate, aggregate: A recipe for causal discovery foundation models
Sample, estimate, aggregate: A recipe for causal discovery foundation models
Menghua Wu
Yujia Bao
Regina Barzilay
Tommi Jaakkola
CML
137
7
0
02 Feb 2024
Plan-Grounded Large Language Models for Dual Goal Conversational
  Settings
Plan-Grounded Large Language Models for Dual Goal Conversational Settings
Diogo Glória-Silva
Rafael Ferreira
Diogo Tavares
David Semedo
João Magalhães
LLMAG
85
4
0
01 Feb 2024
Executable Code Actions Elicit Better LLM Agents
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang
Yangyi Chen
Lifan Yuan
Yizhe Zhang
Yunzhu Li
Hao Peng
Heng Ji
ELMLLMAGLM&Ro
164
167
0
01 Feb 2024
Can Large Language Models Understand Context?
Can Large Language Models Understand Context?
Yilun Zhu
Joel Ruben Antony Moniz
Shruti Bhargava
Jiarui Lu
Dhivya Piraviperumal
Site Li
Yuan-kang Zhang
Hong-ye Yu
Bo-Hsiang Tseng
95
26
0
01 Feb 2024
Towards Efficient Exact Optimization of Language Model Alignment
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji
Cheng Lu
Yilin Niu
Pei Ke
Hongning Wang
Jun Zhu
Jie Tang
Minlie Huang
97
20
0
01 Feb 2024
SymbolicAI: A framework for logic-based approaches combining generative
  models and solvers
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Marius-Constantin Dinu
Claudiu Leoveanu-Condrei
Markus Holzleitner
Werner Zellinger
Sepp Hochreiter
88
11
0
01 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
219
413
0
01 Feb 2024
Enhancing Ethical Explanations of Large Language Models through
  Iterative Symbolic Refinement
Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement
Xin Quan
Marco Valentino
Louise A. Dennis
André Freitas
LRM
71
12
0
01 Feb 2024
Transforming and Combining Rewards for Aligning Large Language Models
Transforming and Combining Rewards for Aligning Large Language Models
Zihao Wang
Chirag Nagpal
Jonathan Berant
Jacob Eisenstein
Alex DÁmour
Oluwasanmi Koyejo
Victor Veitch
97
16
0
01 Feb 2024
Improving Weak-to-Strong Generalization with Scalable Oversight and
  Ensemble Learning
Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning
Jitao Sang
Yuhang Wang
Jing Zhang
Yanxu Zhu
Chao Kong
Junhong Ye
Shuyu Wei
Jinlin Xiao
115
12
0
01 Feb 2024
Learning Planning-based Reasoning by Trajectories Collection and Process
  Reward Synthesizing
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Fangkai Jiao
Chengwei Qin
Zhengyuan Liu
Nancy F. Chen
Shafiq Joty
LRM
107
35
0
01 Feb 2024
Efficient Exploration for LLMs
Efficient Exploration for LLMs
Vikranth Dwaracherla
S. Asghari
Botao Hao
Benjamin Van Roy
LLMAG
100
22
0
01 Feb 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models
  in Social Media Bot Detection
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Shangbin Feng
Herun Wan
Ningnan Wang
Zhaoxuan Tan
Minnan Luo
Yulia Tsvetkov
AAMLDeLMO
112
18
0
01 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM
  Collaboration
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Vidhisha Balachandran
Yulia Tsvetkov
142
104
0
01 Feb 2024
Computational Experiments Meet Large Language Model Based Agents: A
  Survey and Perspective
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
Qun Ma
Xiao Xue
Deyu Zhou
Xiangning Yu
Donghua Liu
...
Yifan Shen
Peilin Ji
Juanjuan Li
Gang Wang
Wanpeng Ma
AI4CELM&RoLLMAG
92
9
0
01 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and
  Challenges
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
158
146
0
31 Jan 2024
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving
  as Human Learners?
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?
Andreas Opedal
Alessandro Stolfo
Haruki Shirakami
Ying Jiao
Ryan Cotterell
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
130
16
0
31 Jan 2024
LongAlign: A Recipe for Long Context Alignment of Large Language Models
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Yushi Bai
Xin Lv
Jiajie Zhang
Yuze He
Ji Qi
Lei Hou
Jie Tang
Yuxiao Dong
Juanzi Li
ALM
100
53
0
31 Jan 2024
I Think, Therefore I am: Benchmarking Awareness of Large Language Models
  Using AwareBench
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Yuan Li
Yue Huang
Yuli Lin
Siyuan Wu
Yao Wan
Lichao Sun
LLMAGELM
77
8
0
31 Jan 2024
Global-Liar: Factuality of LLMs over Time and Geographic Regions
Global-Liar: Factuality of LLMs over Time and Geographic Regions
Shujaat Mirza
Bruno Coelho
Yuyuan Cui
Christina Pöpper
Damon McCoy
HILM
59
6
0
31 Jan 2024
CauESC: A Causal Aware Model for Emotional Support Conversation
CauESC: A Causal Aware Model for Emotional Support Conversation
Wei Chen
Hengxu Lin
Qun Zhang
Xiaojin Zhang
Xiang Bai
Xuanjing Huang
Zhongyu Wei
93
1
0
31 Jan 2024
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via
  large language models
SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models
Xiao Shao
Weifu Jiang
Fei Zuo
Mengqing Liu
LLMAG
95
7
0
31 Jan 2024
Previous
123...101102103...126127128
Next