ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,374 papers shown
Title
Select to Perfect: Imitating desired behavior from large multi-agent
  data
Select to Perfect: Imitating desired behavior from large multi-agent data
Tim Franzmeyer
Edith Elkind
Philip Torr
Jakob N. Foerster
Joao Henriques
92
3
0
06 May 2024
AlphaMath Almost Zero: process Supervision without process
AlphaMath Almost Zero: process Supervision without process
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMatLRM
86
113
0
06 May 2024
MAmmoTH2: Scaling Instructions from the Web
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALMLRM
100
101
0
06 May 2024
Explainable Fake News Detection With Large Language Model via Defense
  Among Competing Wisdom
Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom
Bo Wang
Jing Ma
Hongzhan Lin
Zhiwei Yang
Ruichao Yang
Yuan Tian
Yi-Ju Chang
AAML
100
39
0
06 May 2024
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
Zhizhao Duan
Hao Cheng
Duo Xu
Xi Wu
Xiangxie Zhang
Xi Ye
Zhen Xie
61
8
0
06 May 2024
The Role of Predictive Uncertainty and Diversity in Embodied AI and
  Robot Learning
The Role of Predictive Uncertainty and Diversity in Embodied AI and Robot Learning
Ransalu Senanayake
96
9
0
06 May 2024
Quantifying the Capabilities of LLMs across Scale and Precision
Quantifying the Capabilities of LLMs across Scale and Precision
Sher Badshah
Hassan Sajjad
76
14
0
06 May 2024
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Ziqi Gao
Qichao Wang
Aochuan Chen
Zijing Liu
Bingzhe Wu
Liang Chen
Jia Li
103
35
0
05 May 2024
ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic
  Cloth Manipulation with Observation-Aligned Action Spaces
ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces
Libing Yang
Yang Li
Long Chen
73
3
0
05 May 2024
Language Evolution for Evading Social Media Regulation via LLM-based
  Multi-agent Simulation
Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation
Jinyu Cai
Jialong Li
Mingyue Zhang
Munan Li
Chen-Shu Wang
Kenji Tei
LLMAG
99
6
0
05 May 2024
Get more for less: Principled Data Selection for Warming Up Fine-Tuning
  in LLMs
Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs
Feiyang Kang
H. Just
Yifan Sun
Himanshu Jahagirdar
Yuanzhi Zhang
Rongxing Du
Anit Kumar Sahu
Ruoxi Jia
102
22
0
05 May 2024
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with
  Chain-of-Editions
CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
Hanchong Zhang
Ruisheng Cao
Hongshen Xu
Lu Chen
Kai Yu
ReLMLRM
97
7
0
04 May 2024
Learning Linear Utility Functions From Pairwise Comparison Queries
Learning Linear Utility Functions From Pairwise Comparison Queries
Luise Ge
Brendan Juba
Yevgeniy Vorobeychik
58
3
0
04 May 2024
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through
  Retrieval-Augmented Property Generation
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation
Ye Liu
Yue Xue
Daoyuan Wu
Yuqiang Sun
Yi Li
Miaolei Shi
Yang Liu
98
27
0
04 May 2024
PICLe: Eliciting Diverse Behaviors from Large Language Models with
  Persona In-Context Learning
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Hyeong Kyu Choi
Yixuan Li
127
19
0
03 May 2024
Position: Understanding LLMs Requires More Than Statistical
  Generalization
Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
136
16
0
03 May 2024
CRCL at SemEval-2024 Task 2: Simple prompt optimizations
CRCL at SemEval-2024 Task 2: Simple prompt optimizations
Clément Brutti-Mairesse
L. Verlingue
67
2
0
03 May 2024
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack
ModelShield: Adaptive and Robust Watermark against Model Extraction Attack
Kaiyi Pang
Tao Qi
Chuhan Wu
Minhao Bai
Minghu Jiang
Yongfeng Huang
AAMLWaLM
168
5
0
03 May 2024
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for
  Sexual Education in Rural India
SUKHSANDESH: An Avatar Therapeutic Question Answering Platform for Sexual Education in Rural India
Salam Michael Singh
Shubhmoy Kumar Garg
Amitesh Misra
Aaditeshwar Seth
Tanmoy Chakraborty
68
0
0
03 May 2024
Reinforcement Learning-Guided Semi-Supervised Learning
Reinforcement Learning-Guided Semi-Supervised Learning
Marzi Heidari
Hanping Zhang
Yuhong Guo
OffRL
92
1
0
02 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
95
20
0
02 May 2024
D2PO: Discriminator-Guided DPO with Response Evaluation Models
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal
Nathan Lambert
S. Niekum
Tanya Goyal
Greg Durrett
OffRLEGVM
74
6
0
02 May 2024
Controllable Text Generation in the Instruction-Tuning Era
Controllable Text Generation in the Instruction-Tuning Era
D. Ashok
Barnabas Poczos
105
6
0
02 May 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen
Zhilin Wang
Olivier Delalleau
Jiaqi Zeng
Yi Dong
...
Sahil Jain
Ali Taghibakhshi
Markel Sanz Ausin
Ashwath Aithal
Oleksii Kuchaiev
134
15
0
02 May 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
107
234
0
02 May 2024
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning
Tianle Xia
Liang Ding
Guojia Wan
Yibing Zhan
Bo Du
Dacheng Tao
LRM
72
1
0
02 May 2024
The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation
The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation
Maja Pavlovic
Massimo Poesio
119
21
0
02 May 2024
Self-Play Preference Optimization for Language Model Alignment
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
147
145
0
01 May 2024
Addressing Topic Granularity and Hallucination in Large Language Models
  for Topic Modelling
Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Yida Mu
Peizhen Bai
Kalina Bontcheva
Xingyi Song
75
6
0
01 May 2024
The Real, the Better: Aligning Large Language Models with Online Human
  Behaviors
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
D. Yin
84
2
0
01 May 2024
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu
Yiming Hao
Manyuan Zhang
Keqiang Sun
Zhaoyang Huang
Guanglu Song
Yu Liu
Hongsheng Li
EGVM
127
25
0
01 May 2024
CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions
CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions
Donghee Choi
Mogan Gim
Donghyeon Park
Mujeen Sung
Hyunjae Kim
Jaewoo Kang
Jihun Choi
77
1
0
01 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference
  Learning
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLMLRM
143
126
0
01 May 2024
MetaRM: Shifted Distributions Alignment via Meta-Learning
MetaRM: Shifted Distributions Alignment via Meta-Learning
Shihan Dou
Yan Liu
Enyu Zhou
Changze Lv
Haoxiang Jia
...
Junjie Ye
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OOD
160
2
0
01 May 2024
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Leonardo Ranaldi
André Freitas
LRMReLM
88
16
0
01 May 2024
ASAM: Boosting Segment Anything Model with Adversarial Tuning
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li
Haoke Xiao
Lv Tang
105
10
0
01 May 2024
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Deng Li
Xin Liu
Bohao Xing
Baiqiang Xia
Yuan Zong
Bihan Wen
Heikki Kälviäinen
130
6
0
01 May 2024
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Jiahui Gao
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
243
10
0
01 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference
  Aggregation
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
151
41
0
30 Apr 2024
Soft Preference Optimization: Aligning Language Models to Expert
  Distributions
Soft Preference Optimization: Aligning Language Models to Expert Distributions
Arsalan Sharifnassab
Sina Ghiassian
Saber Salehkaleybar
Surya Kanoria
Dale Schuurmans
95
3
0
30 Apr 2024
StablePT: Towards Stable Prompting for Few-shot Learning via Input
  Separation
StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation
Xiaoming Liu
Chen Liu
Zhaohan Zhang
Chengzhengxu Li
Longtian Wang
Y. Lan
Chao Shen
VLM
87
4
0
30 Apr 2024
Multi-hop Question Answering over Knowledge Graphs using Large Language
  Models
Multi-hop Question Answering over Knowledge Graphs using Large Language Models
Abir Chakraborty
KELMRALM
79
6
0
30 Apr 2024
Game-MUG: Multimodal Oriented Game Situation Understanding and
  Commentary Generation Dataset
Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Zhihao Zhang
Feiqi Cao
Yingbin Mo
Yiran Zhang
Josiah Poon
S. Han
65
1
0
30 Apr 2024
Stylus: Automatic Adapter Selection for Diffusion Models
Stylus: Automatic Adapter Selection for Diffusion Models
Michael Luo
Justin Wong
Brandon Trabucco
Yanping Huang
Joseph E. Gonzalez
Zhifeng Chen
Ruslan Salakhutdinov
Ion Stoica
DiffM
79
7
0
29 Apr 2024
Performance-Aligned LLMs for Generating Fast Code
Performance-Aligned LLMs for Generating Fast Code
Daniel Nichols
Pranav Polasam
Harshitha Menon
Aniruddha Marathe
T. Gamblin
A. Bhatele
90
10
0
29 Apr 2024
ConPro: Learning Severity Representation for Medical Images using
  Contrastive Learning and Preference Optimization
ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization
Hong Nguyen
H. Nguyen
Melinda Y. Chang
Hieu H. Pham
Shrikanth Narayanan
Michael Pazzani
59
1
0
29 Apr 2024
AppPoet: Large Language Model based Android malware detection via
  multi-view prompt engineering
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering
Wenxiang Zhao
Juntao Wu
Zhaoyi Meng
AAML
56
14
0
29 Apr 2024
Reinforcement Learning Problem Solving with Large Language Models
Reinforcement Learning Problem Solving with Large Language Models
Sina Gholamian
Domingo Huh
87
0
0
29 Apr 2024
Evaluating and Mitigating Linguistic Discrimination in Large Language
  Models
Evaluating and Mitigating Linguistic Discrimination in Large Language Models
Guoliang Dong
Haoyu Wang
Jun Sun
Xinyu Wang
82
4
0
29 Apr 2024
HFT: Half Fine-Tuning for Large Language Models
HFT: Half Fine-Tuning for Large Language Models
Tingfeng Hui
Zhenyu Zhang
Shuohuan Wang
Weiran Xu
Yu Sun
Hua Wu
CLL
101
7
0
29 Apr 2024
Previous
123...787980...126127128
Next