ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,384 papers shown
Title
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models
  via Counterfactual Probing
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
Yisong Xiao
Aishan Liu
QianJia Cheng
Zhenfei Yin
Siyuan Liang
Jiapeng Li
Jing Shao
Xianglong Liu
Dacheng Tao
124
8
0
30 Jun 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
228
18
0
30 Jun 2024
Too Late to Train, Too Early To Use? A Study on Necessity and Viability
  of Low-Resource Bengali LLMs
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs
Tamzeed Mahfuz
Satak Kumar Dey
Ruwad Naswan
Hasnaen Adil
Khondker Salman Sayeed
Haz Sameen Shahgir
78
1
0
29 Jun 2024
Advancing Process Verification for Large Language Models via Tree-Based
  Preference Learning
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He
Yongliang Shen
Wenqi Zhang
Zeqi Tan
Weiming Lu
LRM
80
7
0
29 Jun 2024
LLM-Generated Natural Language Meets Scaling Laws: New Explorations and
  Data Augmentation Methods
LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods
Zhenhua Wang
Guang Xu
Ming Ren
87
5
0
29 Jun 2024
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert
  Prompts
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
Ruochen Wang
Sohyun An
Minhao Cheng
Tianyi Zhou
Sung Ju Hwang
Cho-Jui Hsieh
73
9
0
28 Jun 2024
LLM Critics Help Catch LLM Bugs
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALMLRM
85
83
0
28 Jun 2024
MetaKP: On-Demand Keyphrase Generation
MetaKP: On-Demand Keyphrase Generation
Di Wu
Xiaoxian Shen
Kai-Wei Chang
63
0
0
28 Jun 2024
ProgressGym: Alignment with a Millennium of Moral Progress
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Yalan Qin
Yaodong Yang
AI4TS
110
7
0
28 Jun 2024
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Sujan Dutta
Sayantan Mahinder
R. Anantha
Bortik Bandyopadhyay
ALM
75
7
0
28 Jun 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILMAAML
103
36
0
28 Jun 2024
The SIFo Benchmark: Investigating the Sequential Instruction Following
  Ability of Large Language Models
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
Xinyi Chen
Baohao Liao
Jirui Qi
Panagiotis Eustratiadis
Christof Monz
Arianna Bisazza
Maarten de Rijke
ALMELMLRM
84
7
0
28 Jun 2024
Calibrating LLMs with Preference Optimization on Thought Trees for
  Generating Rationale in Science Question Scoring
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Jiazheng Li
Hainiu Xu
ZHAOYUE SUN
Yuxiang Zhou
David West
Cesare Aloisi
Yulan He
LRM
74
4
0
28 Jun 2024
Paraphrase Types Elicit Prompt Engineering Capabilities
Paraphrase Types Elicit Prompt Engineering Capabilities
Jan Philip Wahle
Terry Ruas
Yang Xu
Bela Gipp
145
10
0
28 Jun 2024
BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for
  Multi-hop Question Answering
BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering
Zheng Chu
Jingchang Chen
Qianglong Chen
Haotian Wang
Kun Zhu
Xiyuan Du
Weijiang Yu
Ming Liu
Bing Qin
LRM
151
5
0
28 Jun 2024
MM-Instruct: Generated Visual Instructions for Large Multimodal Model
  Alignment
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Jihao Liu
Xin Huang
Jinliang Zheng
Boxiao Liu
Jia Wang
Osamu Yoshie
Yu Liu
Hongsheng Li
MLLMSyDa
63
4
0
28 Jun 2024
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?
Jinming Li
Yichen Zhu
Zhiyuan Xu
Jindong Gu
Minjie Zhu
Xin Liu
Ning Liu
Yaxin Peng
Feifei Feng
Jian Tang
LRMLM&Ro
105
8
0
28 Jun 2024
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
Shufan Li
Harkanwar Singh
Aditya Grover
EGVM
136
2
0
28 Jun 2024
DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark
  for Incoherence Detection, Reasoning, and Rewriting
DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting
Xuanming Zhang
Anthony Diaz
Zixun Chen
Qingyang Wu
Kun Qian
Erik Voss
Zhou Yu
57
1
0
28 Jun 2024
When Search Engine Services meet Large Language Models: Visions and
  Challenges
When Search Engine Services meet Large Language Models: Visions and Challenges
Haoyi Xiong
Jiang Bian
Yuchen Li
Xuhong Li
Jundong Li
Shuaiqiang Wang
D. Yin
Sumi Helal
141
36
0
28 Jun 2024
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation
Peiyang Wu
Nan Guo
Xiao Xiao
Wenming Li
Mingyu Yan
Xiaochun Ye
126
2
0
28 Jun 2024
Direct Preference Knowledge Distillation for Large Language Models
Direct Preference Knowledge Distillation for Large Language Models
Yixing Li
Yuxian Gu
Li Dong
Dequan Wang
Yu Cheng
Furu Wei
121
8
0
28 Jun 2024
Rethinking harmless refusals when fine-tuning foundation models
Rethinking harmless refusals when fine-tuning foundation models
Florin Pop
Judd Rosenblatt
Diogo Schwerz de Lucena
Michael Vaiana
30
0
0
27 Jun 2024
Suri: Multi-constraint Instruction Following for Long-form Text
  Generation
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham
Simeng Sun
Mohit Iyyer
ALMLRM
129
23
0
27 Jun 2024
Granite-Function Calling Model: Introducing Function Calling Abilities
  via Multi-task Learning of Granular Tasks
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
Kinjal Basu
Mayank Agarwal
Yara Rizk
Matthew Stallone
...
Merve Unuvar
David D. Cox
Salim Roukos
Luis A. Lastras
Pavan Kapanipathi
LLMAG
92
24
0
27 Jun 2024
DiVERT: Distractor Generation with Variational Errors Represented as
  Text for Math Multiple-choice Questions
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew Lan
AAML
95
4
0
27 Jun 2024
Fundamental Problems With Model Editing: How Should Rational Belief
  Revision Work in LLMs?
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Peter Hase
Thomas Hofweber
Xiang Zhou
Elias Stengel-Eskin
Joey Tianyi Zhou
KELMLRM
101
17
0
27 Jun 2024
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative
  Object REarrangement
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement
Chengwen Zhang
Yun-Hai Liu
Ruofan Xing
Bingda Tang
Li Yi
80
12
0
27 Jun 2024
Diminishing Stereotype Bias in Image Generation Model using
  Reinforcemenlent Learning Feedback
Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback
Xin Chen
Virgile Foussereau
EGVM
83
0
0
27 Jun 2024
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for
  Retrieval-Augmented Generation
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
Jia Fu
Xiaoting Qin
Fangkai Yang
Lu Wang
Jue Zhang
Qingwei Lin
Yubo Chen
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
71
6
0
27 Jun 2024
Towards Temporal Change Explanations from Bi-Temporal Satellite Images
Towards Temporal Change Explanations from Bi-Temporal Satellite Images
Ryo Tsujimoto
Hiroki Ouchi
Hidetaka Kamigaito
Taro Watanabe
40
2
0
27 Jun 2024
CELLO: Causal Evaluation of Large Vision-Language Models
CELLO: Causal Evaluation of Large Vision-Language Models
Meiqi Chen
Bo Peng
Yan Zhang
Chaochao Lu
LRMELM
77
0
0
27 Jun 2024
CHEW: A Dataset of CHanging Events in Wikipedia
CHEW: A Dataset of CHanging Events in Wikipedia
Hsuvas Borkakoty
Luis Espinosa-Anke
89
1
0
27 Jun 2024
Alignment For Performance Improvement in Conversation Bots
Alignment For Performance Improvement in Conversation Bots
Raghav Garg
Kapil Sharma
Shrey Singla
74
0
0
27 Jun 2024
LLM-based Frameworks for API Argument Filling in Task-Oriented
  Conversational Systems
LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems
J. Mok
Mohammad Kachuee
Shuyang Dai
Shayan Ray
Tara Taghavi
Sungroh Yoon
LLMAG
41
3
0
27 Jun 2024
TrustUQA: A Trustful Framework for Unified Structured Data Question
  Answering
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering
Wen Zhang
Long Jin
Yushan Zhu
Jiaoyan Chen
Zhiwei Huang
Junjie Wang
Yin Hua
Lei Liang
Huajun Chen
LMTD
89
3
0
27 Jun 2024
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate
  Prediction
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction
Zhongxiang Fan
Zhaocheng Liu
Jian Liang
Dongying Kong
Han Li
Peng Jiang
Shuang Li
Kun Gai
102
0
0
27 Jun 2024
Efficacy of Language Model Self-Play in Non-Zero-Sum Games
Efficacy of Language Model Self-Play in Non-Zero-Sum Games
Austen Liao
Nicholas Tomlin
Dan Klein
105
1
0
27 Jun 2024
On Discrete Prompt Optimization for Diffusion Models
On Discrete Prompt Optimization for Diffusion Models
Ruochen Wang
Ting Liu
Cho-Jui Hsieh
Boqing Gong
DiffM
89
8
0
27 Jun 2024
Decoding-Time Language Model Alignment with Multiple Objectives
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi
Yifang Chen
Yushi Hu
Alisa Liu
Hannaneh Hajishirzi
Noah A. Smith
Simon Du
140
43
0
27 Jun 2024
Learning Retrieval Augmentation for Personalized Dialogue Generation
Learning Retrieval Augmentation for Personalized Dialogue Generation
Qiushi Huang
Shuai Fu
Xubo Liu
Wenwu Wang
Tom Ko
Yu Zhang
Lilian H. Y. Tang
RALM
119
18
0
27 Jun 2024
Can Large Language Models Generate High-quality Patent Claims?
Can Large Language Models Generate High-quality Patent Claims?
Lekang Jiang
Caiqi Zhang
Pascal A Scherz
Stephan Goetz
ELM
127
7
0
27 Jun 2024
Aligning Model Properties via Conformal Risk Control
Aligning Model Properties via Conformal Risk Control
William Overman
Jacqueline Jil Vallon
Mohsen Bayati
66
3
0
26 Jun 2024
Jailbreaking LLMs with Arabic Transliteration and Arabizi
Jailbreaking LLMs with Arabic Transliteration and Arabizi
Mansour Al Ghanim
Saleh Almohaimeed
Mengxin Zheng
Yan Solihin
Qian Lou
67
4
0
26 Jun 2024
The Multilingual Alignment Prism: Aligning Global and Local Preferences
  to Reduce Harm
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha
Arash Ahmadian
Beyza Ermis
Seraphina Goldfarb-Tarrant
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
124
39
0
26 Jun 2024
Understand What LLM Needs: Dual Preference Alignment for
  Retrieval-Augmented Generation
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation
Guanting Dong
Yutao Zhu
Chenghao Zhang
Zechen Wang
Zhicheng Dou
Ji-Rong Wen
RALM
110
13
0
26 Jun 2024
Symbolic Learning Enables Self-Evolving Agents
Symbolic Learning Enables Self-Evolving Agents
Wangchunshu Zhou
Yixin Ou
Shengwei Ding
Long Li
Jialong Wu
...
Shuai Wang
Xiaohua Xu
Xin Xu
Huajun Chen
Yuchen Eleanor Jiang
AI4CELM&RoLLMAG
110
38
0
26 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine
  Translation and Summarization Evaluation
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
85
9
0
26 Jun 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
  LLMs
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
177
126
0
26 Jun 2024
Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain
  Human-Machine Conversation
Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation
Ahmed Njifenjou
Virgile Sucal
Bassam Jabaian
Fabrice Lefèvre
AI4CEALM
63
3
0
26 Jun 2024
Previous
123...626364...126127128
Next