ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,390 papers shown
Title
Audio-visual training for improved grounding in video-text LLMs
Audio-visual training for improved grounding in video-text LLMs
Shivprasad Sagare
Hemachandran S
Kinshuk Sarabhai
Prashant Ullegaddi
SA Rajeshkumar
62
0
0
21 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MAOffRL
116
28
0
20 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
129
43
0
20 Jul 2024
Improving Context-Aware Preference Modeling for Language Models
Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis
Ziang Xiao
Nicolas Le Roux
Alessandro Sordoni
99
12
0
20 Jul 2024
DISCO: Embodied Navigation and Interaction via Differentiable Scene
  Semantics and Dual-level Control
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
Xinyu Xu
Shengcheng Luo
Yanchao Yang
Yong-Lu Li
Cewu Lu
LM&Ro
109
1
0
20 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
Value Internalization: Learning and Generalizing from Social Reward
Value Internalization: Learning and Generalizing from Social Reward
Frieda Rong
Max Kleiman-Weiner
68
1
0
19 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A
  Survey
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILMLRM
162
30
0
19 Jul 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
103
7
0
19 Jul 2024
The Vision of Autonomic Computing: Can LLMs Make It a Reality?
The Vision of Autonomic Computing: Can LLMs Make It a Reality?
Zhiyang Zhang
Fangkai Yang
Xiaoting Qin
Jue Zhang
Qingwei Lin
Gong Cheng
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
49
3
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text
  Generation: A State-of-the-Art Investigation
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
92
10
0
19 Jul 2024
Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by
  Direct Preference Optimization
Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization
Md Sultan al Nahian
R. Kavuluru
MedImAI4CE
61
0
0
19 Jul 2024
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval
  Augmented Question Answering
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering
Rujun Han
Yuhao Zhang
Peng Qi
Yumo Xu
Jenyuan Wang
Lan Liu
William Yang Wang
Bonan Min
Vittorio Castelli
RALM
78
29
0
19 Jul 2024
Decomposed Direct Preference Optimization for Structure-Based Drug
  Design
Decomposed Direct Preference Optimization for Structure-Based Drug Design
Xiwei Cheng
Xiangxin Zhou
Yuwei Yang
Yu Bao
Quanquan Gu
67
3
0
19 Jul 2024
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
202
6
0
19 Jul 2024
Data-Centric Human Preference with Rationales for Direct Preference Alignment
Data-Centric Human Preference with Rationales for Direct Preference Alignment
H. Just
Ming Jin
Anit Kumar Sahu
Huy Phan
Ruoxi Jia
90
3
0
19 Jul 2024
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API
  Call Generation through State-tracked Constrained Decoding and Reranking
FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking
Zhuoer Wang
Leonardo F. R. Ribeiro
Alexandros Papangelis
Rohan Mukherjee
Tzu-Yen Wang
Xinyan Zhao
Arijit Biswas
James Caverlee
A. Metallinou
74
0
0
18 Jul 2024
BiasDPO: Mitigating Bias in Language Models through Direct Preference
  Optimization
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization
Ahmed Allam
94
10
0
18 Jul 2024
Learning Goal-Conditioned Representations for Language Reward Models
Learning Goal-Conditioned Representations for Language Reward Models
Vaskar Nath
Dylan Slack
Jeff Da
Yuntao Ma
Hugh Zhang
Spencer Whitehead
Sean Hendryx
58
0
0
18 Jul 2024
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for
  Evaluation
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
David Schlangen
78
1
0
18 Jul 2024
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion
  Models: A Tutorial and Review
Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review
Masatoshi Uehara
Yulai Zhao
Tommaso Biancalani
Sergey Levine
147
32
0
18 Jul 2024
Understanding Reference Policies in Direct Preference Optimization
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
73
11
0
18 Jul 2024
Prover-Verifier Games improve legibility of LLM outputs
Prover-Verifier Games improve legibility of LLM outputs
Jan Hendrik Kirchner
Yining Chen
Harri Edwards
Jan Leike
Nat McAleese
Yuri Burda
LRMAAML
82
32
0
18 Jul 2024
Weak-to-Strong Reasoning
Weak-to-Strong Reasoning
Yuqing Yang
Yan Ma
Pengfei Liu
LRM
80
19
0
18 Jul 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min Lin
Ngai Wong
PILM
132
54
0
18 Jul 2024
Research on Tibetan Tourism Viewpoints information generation system
  based on LLM
Research on Tibetan Tourism Viewpoints information generation system based on LLM
Jinhu Qi
Shuai Yan
Wentao Zhang
Yibo Zhang
Zirui Liu
Ke Wang
53
1
0
18 Jul 2024
DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection
DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection
Donghee Choi
Jinkyu Kim
Mogan Gim
Jinho Lee
Jaewoo Kang
80
0
0
18 Jul 2024
From Words to Worlds: Compositionality for Cognitive Architectures
From Words to Worlds: Compositionality for Cognitive Architectures
Ruchira Dhar
Anders Sogaard
98
0
0
18 Jul 2024
Learning-From-Mistakes Prompting for Indigenous Language Translation
Learning-From-Mistakes Prompting for Indigenous Language Translation
You-Cheng Liao
Chen-Jui Yu
Chi-Yi Lin
He-Feng Yun
Yen-Hsiang Wang
Hsiao-Min Li
Yao-Chung Fan
100
1
0
18 Jul 2024
Multimodal Label Relevance Ranking via Reinforcement Learning
Multimodal Label Relevance Ranking via Reinforcement Learning
Taian Guo
Taolin Zhang
Haoqian Wu
Hanjun Li
Ruizhi Qiao
Xing Sun
OffRL
50
0
0
18 Jul 2024
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for
  Fact-Checking
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking
Ting-Chih Chen
Chia-Wei Tang
Chris Thomas
96
5
0
18 Jul 2024
Establishing Knowledge Preference in Language Models
Establishing Knowledge Preference in Language Models
Sizhe Zhou
Sha Li
Yu Meng
Yizhu Jiao
Heng Ji
Jiawei Han
KELM
135
0
0
17 Jul 2024
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
To Eun Kim
Alireza Salemi
Andrew Drozdov
Fernando Diaz
Hamed Zamani
126
8
0
17 Jul 2024
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller
  Embedding Dimensions
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions
Jinsung Yoon
Raj Sinha
Sercan O. Arik
Tomas Pfister
73
1
0
17 Jul 2024
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative
  Navigation for Text-Video Retrieval-Rerank Pipeline
MERLIN: Multimodal Embedding Refinement via LLM-based Iterative Navigation for Text-Video Retrieval-Rerank Pipeline
D. Han
Eunhwan Park
Gisang Lee
Adam Lee
Nojun Kwak
128
4
0
17 Jul 2024
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
Jie Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong Liu
Yu Qiao
Jing Shao
LLMAGPILM
101
12
0
17 Jul 2024
VCP-CLIP: A visual context prompting model for zero-shot anomaly
  segmentation
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu
Xian Tao
Mukesh Prasad
Fei Shen
Zhengtao Zhang
Xinyi Gong
Guiguang Ding
VLM
108
16
0
17 Jul 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
110
6
0
17 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffMVGen
200
17
0
17 Jul 2024
PersLLM: A Personified Training Approach for Large Language Models
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
Haotian Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
139
2
0
17 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
143
2
0
17 Jul 2024
Satisficing Exploration for Deep Reinforcement Learning
Satisficing Exploration for Deep Reinforcement Learning
Dilip Arumugam
Saurabh Kumar
Ramki Gummadi
Benjamin Van Roy
67
1
0
16 Jul 2024
Exploration Unbound
Exploration Unbound
Dilip Arumugam
Wanqiao Xu
Benjamin Van Roy
80
0
0
16 Jul 2024
Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large
  Language Models
Continuous Embedding Attacks via Clipped Inputs in Jailbreaking Large Language Models
Zihao Xu
Yi Liu
Gelei Deng
Kailong Wang
Yuekang Li
Ling Shi
S. Picek
KELM
80
0
0
16 Jul 2024
Subject-driven Text-to-Image Generation via Preference-based
  Reinforcement Learning
Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning
Yanting Miao
William Loh
Suraj Kothawade
Pascal Poupart
Abdullah Rashwan
Yeqing Li
EGVM
67
5
0
16 Jul 2024
What's Wrong? Refining Meeting Summaries with LLM Feedback
What's Wrong? Refining Meeting Summaries with LLM Feedback
Frederic Kirstein
Terry Ruas
Bela Gipp
111
6
0
16 Jul 2024
Enhancing Parameter Efficiency and Generalization in Large-Scale Models:
  A Regularized and Masked Low-Rank Adaptation Approach
Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach
Yuzhu Mao
Siqi Ping
Zihao Zhao
Yang Liu
Wenbo Ding
103
1
0
16 Jul 2024
SwitchCIT: Switching for Continual Instruction Tuning of Large Language
  Models
SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models
Xinbo Wu
Max Hartman
Vidhata Arjun Jayaraman
Lav Varshney
CLLLRM
110
1
0
16 Jul 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
70
3
0
16 Jul 2024
Do LLMs have Consistent Values?
Do LLMs have Consistent Values?
Naama Rozen
G. Elidan
Amir Globerson
Ella Daniel
132
4
0
16 Jul 2024
Previous
123...585960...126127128
Next