ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLM
    ALM
ArXivPDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 4,682 papers shown
Title
Learning to Instruct for Visual Instruction Tuning
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Yuyao Zhang
Yanfeng Wang
VLM
66
0
0
28 Mar 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xinyu Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
53
1
0
28 Mar 2025
SocialGen: Modeling Multi-Human Social Interaction with Language Models
SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu
Juze Zhang
Changan Chen
Tiange Xiang
Yusu Fang
Juan Carlos Niebles
Ehsan Adeli
VGen
49
0
0
28 Mar 2025
Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering
Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering
Magdalena Kaiser
Gerhard Weikum
42
0
0
28 Mar 2025
Learning to Reason for Long-Form Story Generation
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLM
OffRL
LRM
63
1
0
28 Mar 2025
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li
Chuanxia Zheng
Christian Rupprecht
Andrea Vedaldi
42
1
0
28 Mar 2025
Controlling Large Language Model with Latent Actions
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
56
0
0
27 Mar 2025
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang
Xusen Ma
Xianxu Hou
Meidan Ding
Yudong Li
Junliang Chen
Wenting Chen
Xiaoyang Peng
LinLin Shen
CVBM
73
0
0
27 Mar 2025
SWI: Speaking with Intent in Large Language Models
SWI: Speaking with Intent in Large Language Models
Yuwei Yin
EunJeong Hwang
Giuseppe Carenini
LRM
51
0
0
27 Mar 2025
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
Yuyao Zhang
Mengchen Zhang
Tong Wu
Tengfei Wang
Gordon Wetzstein
Dahua Lin
Ziwei Liu
ELM
79
0
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
65
0
0
27 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
53
0
0
27 Mar 2025
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian Sun
Wei Ma
72
1
0
27 Mar 2025
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
Zhouhong Gu
Xingzhou Chen
Xiaoran Shi
Tao Wang
Suhang Zheng
Tianyu Li
Hongwei Feng
Yanghua Xiao
78
0
0
26 Mar 2025
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
Xuying Li
Zhuo Li
Yuji Kosuga
Victor Bian
53
3
0
26 Mar 2025
A multi-agentic framework for real-time, autonomous freeform metasurface design
A multi-agentic framework for real-time, autonomous freeform metasurface design
Robert Lupoiu
Yixuan Shao
Tianxiang Dai
Chenkai Mao
Kofi Edee
Jonathan A. Fan
AI4CE
73
0
0
26 Mar 2025
Cyborg Data: Merging Human with AI Generated Training Data
Cyborg Data: Merging Human with AI Generated Training Data
Kai North
Christopher Ormerod
37
0
0
26 Mar 2025
Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy
Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy
Yinan Sun
Xiongkuo Min
Zicheng Zhang
Yixuan Gao
Yuhang Cao
Guangtao Zhai
VLM
64
0
0
26 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
75
0
0
26 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
Multi-head Reward Aggregation Guided by Entropy
Multi-head Reward Aggregation Guided by Entropy
Xiaomin Li
Xupeng Chen
Jingxuan Fan
Eric Hanchen Jiang
Mingye Gao
AAML
60
2
0
26 Mar 2025
Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead
Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead
Viktor Schlegel
Anil A Bharath
Zilong Zhao
Kevin Yee
71
0
0
26 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
167
2
0
26 Mar 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
97
2
0
26 Mar 2025
TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design
TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design
Xiuyuan Hu
Guoqing Liu
Can Chen
Yang Zhao
Hao Zhang
Xue Liu
64
2
0
26 Mar 2025
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong
Seyun Bae
Yeonsung Jung
Jaeryong Hwang
Eunho Yang
AAML
45
1
0
26 Mar 2025
Linguistic Blind Spots of Large Language Models
Linguistic Blind Spots of Large Language Models
Jiali Cheng
Hadi Amiri
60
1
0
25 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
44
2
0
25 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
46
0
0
25 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAML
ALM
ELM
58
0
0
25 Mar 2025
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Ran Tian
Kratarth Goel
46
0
0
25 Mar 2025
CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model
CubeRobot: Grounding Language in Rubik's Cube Manipulation via Vision-Language Model
Feiyang Wang
Xiaomin Yu
Wangyu Wu
LM&Ro
65
0
0
25 Mar 2025
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Sophie Hao
ELM
AI4CE
54
0
0
25 Mar 2025
OmniNova:A General Multimodal Agent Framework
OmniNova:A General Multimodal Agent Framework
Pengfei Du
LLMAG
47
0
0
25 Mar 2025
SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling
SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling
Yaofei Wang
Gang Pei
Kejiang Chen
Jinyang Ding
Chao Pan
Weilong Pang
Donghui Hu
Weixi Zhang
49
1
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
53
2
0
25 Mar 2025
Learning to chain-of-thought with Jensen's evidence lower bound
Learning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang
Sid Wang
Rémi Munos
BDL
OffRL
LRM
55
0
0
25 Mar 2025
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
Yunhao Tang
Kunhao Zheng
Gabriel Synnaeve
Rémi Munos
41
1
0
25 Mar 2025
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
Wenhao You
Bryan Hooi
Yiwei Wang
Yibo Wang
Zong Ke
Ming Yang
Zi Huang
Yujun Cai
AAML
61
0
0
24 Mar 2025
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages
Patrick Diehl
Nojoud Nader
Maxim Moraru
Steven R. Brandt
39
1
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
53
0
0
24 Mar 2025
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunfan LU
Qichao Wang
H. Cao
Xierui Wang
Xiaoyin Xu
Min Zhang
64
0
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
62
2
0
24 Mar 2025
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Wen Zheng Terence Ng
Jianda Chen
Yuan Xu
Tianwei Zhang
41
0
0
24 Mar 2025
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
Pin-Yu Chen
Han Shen
Payel Das
Tianyi Chen
50
0
0
24 Mar 2025
Evolutionary Policy Optimization
Evolutionary Policy Optimization
Jianren Wang
Yifan Su
Abhinav Gupta
Deepak Pathak
50
0
0
24 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jize Wang
VLM
67
3
0
23 Mar 2025
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Weixiang Zhao
Xingyu Sui
Jiahe Guo
Yulin Hu
Yang Deng
Yanyan Zhao
Bing Qin
Wanxiang Che
Tat-Seng Chua
Ting Liu
ELM
LRM
64
4
0
23 Mar 2025
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Zefeng Zhang
Hengzhu Tang
Shuaiyi Nie
Zhenyu Zhang
Yiming Ren
Zhenyang Li
Dawei Yin
Duohe Ma
Tingwen Liu
52
0
0
23 Mar 2025
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality
Ruijia Zhang
Siliang Zeng
Chenliang Li
Alfredo García
Mingyi Hong
67
0
0
22 Mar 2025
Previous
123...8910...929394
Next