ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,372 papers shown
Title
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Aya Kayal
Sattar Vakili
Laura Toni
Da-shan Shiu
A. Bernacchia
39
0
0
29 May 2025
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Chenyu Yang
Shiqian Su
Shi-Qi Liu
Xuan Dong
Yue Yu
...
Hao Li
Wenhai Wang
Yu Qiao
Xizhou Zhu
Jifeng Dai
OffRL
144
0
0
29 May 2025
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration
Yilong Li
Chen Qian
Yu Xia
Ruijie Shi
Yufan Dang
...
Ye Tian
Xuantang Xiong
Lei Han
Zhiyuan Liu
Maosong Sun
LLMAG
76
0
0
29 May 2025
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
Kunlun Zhu
Jiaxun Zhang
Ziheng Qi
Nuoxing Shang
Zijia Liu
Peixuan Han
Yue Su
Haofei Yu
Jiaxuan You
59
0
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
32
0
0
29 May 2025
Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models
Fortune: Formula-Driven Reinforcement Learning for Symbolic Table Reasoning in Language Models
Lang Cao
Jingxian Xu
Hanbing Liu
Jinyu Wang
Mengyu Zhou
Haoyu Dong
Shi Han
Dongmei Zhang
LRMOffRLLMTDReLM
61
0
0
29 May 2025
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo
Yinchuan Li
Zhitang Chen
69
0
0
29 May 2025
Differential Information: An Information-Theoretic Perspective on Preference Optimization
Differential Information: An Information-Theoretic Perspective on Preference Optimization
Yunjae Won
Hyunji Lee
Hyeonbin Hwang
Minjoon Seo
27
0
0
29 May 2025
Identity resolution of software metadata using Large Language Models
Identity resolution of software metadata using Large Language Models
Eva Martín del Pico
Josep Lluís Gelpí
Salvador Capella-Gutiérrez
32
0
0
29 May 2025
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities
Sahil Verma
Keegan E. Hines
J. Bilmes
Charlotte Siska
Luke Zettlemoyer
Hila Gonen
Chandan Singh
AAML
24
0
0
29 May 2025
MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design
MenTeR: A fully-automated Multi-agenT workflow for end-to-end RF/Analog Circuits Netlist Design
Pin-Han Chen
Y. Lin
Wei-Cheng Lee
Tin-Yu Leu
Po-Hsiang Hsu
Anjana Dissanayake
Sungjin Oh
Chinq-Shiun Chiu
50
0
0
29 May 2025
Understanding Refusal in Language Models with Sparse Autoencoders
Understanding Refusal in Language Models with Sparse Autoencoders
Wei Jie Yeo
Nirmalendu Prakash
Clement Neo
Roy Ka-wei Lee
Erik Cambria
Ranjan Satapathy
18
0
0
29 May 2025
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Harish Tayyar Madabushi
Melissa Torgbi
C. Bonial
68
0
0
29 May 2025
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Caiqi Zhang
Xiaochen Zhu
Chengzu Li
Nigel Collier
Andreas Vlachos
OffRLHILM
53
1
0
29 May 2025
Are Reasoning Models More Prone to Hallucination?
Are Reasoning Models More Prone to Hallucination?
Zijun Yao
Y. Liu
Yanxu Chen
Jianhui Chen
Junfeng Fang
Lei Hou
Juanzi Li
Tat-Seng Chua
ReLMHILMLRM
130
0
0
29 May 2025
MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
MAP: Revisiting Weight Decomposition for Low-Rank Adaptation
Chongjie Si
Zhiyi Shi
Yadao Wang
Xiaokang Yang
Susanto Rahardja
Wei Shen
64
0
0
29 May 2025
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport
Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport
Yuu Jinnai
OT
50
0
0
29 May 2025
PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents
Haoyu Chen
Keda Tao
Yizao Wang
Xinlei Wang
Lei Zhu
Jinjin Gu
KELM
52
0
0
29 May 2025
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz
Nika Haghtalab
Kunhe Yang
51
0
0
29 May 2025
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA
Minrui Luo
Fuhang Kuang
Yu Wang
Zirui Liu
Tianxing He
CLL
62
0
0
29 May 2025
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Sheng Ouyang
Yulan Hu
Ge Chen
Qingyang Li
Fuzheng Zhang
Yong Liu
35
0
0
29 May 2025
The End Of Universal Lifelong Identifiers: Identity Systems For The AI Era
The End Of Universal Lifelong Identifiers: Identity Systems For The AI Era
Shriphani Palakodety
27
0
0
29 May 2025
Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
Mis-prompt: Benchmarking Large Language Models for Proactive Error Handling
Jiayi Zeng
Yizhe Feng
Mengliang He
Wenhui Lei
Wei Zhang
Zeming Liu
Xiaoming Shi
Aimin Zhou
LRM
28
0
0
29 May 2025
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes
DIP-R1: Deep Inspection and Perception with RL Looking Through and Understanding Complex Scenes
Sungjune Park
Hyunjun Kim
Junho Kim
S. T. Kim
Y. Ro
LRM
123
0
0
29 May 2025
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation
Zeyu Liu
Zhitian Hou
Yining Di
Kejing Yang
Zhijie Sang
...
Siyuan Liu
Jialu Wang
Chunming Li
Ming Li
Hongxia Yang
LM&MALRM
20
0
0
29 May 2025
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models
Zeyu Liu
Y. Liu
Guanghao Zhu
C. Xie
Zhen Li
...
Qing Li
Shing-Chi Cheung
Shengyu Zhang
Fei Wu
Hongxia Yang
ReLMLRM
87
0
0
29 May 2025
Xinyu AI Search: Enhanced Relevance and Comprehensive Results with Rich Answer Presentations
Xinyu AI Search: Enhanced Relevance and Comprehensive Results with Rich Answer Presentations
Bo Tang
Junyi Zhu
Chenyang Xi
Yunhang Ge
Jiahao Wu
...
Yebin Yang
Jiajia Wang
Zhiyu Li
Feiyu Xiong
Jingrun Chen
58
0
0
28 May 2025
Reverse Preference Optimization for Complex Instruction Following
Reverse Preference Optimization for Complex Instruction Following
Xiang Huang
Ting-En Lin
Feiteng Fang
Yuchuan Wu
Hangyu Li
Yuzhong Qu
Fei Huang
Yongbin Li
46
0
0
28 May 2025
LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents
LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents
Taro Yano
Yoichi Ishibashi
Masafumi Oyamada
LM&Ro
64
1
0
28 May 2025
When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
Jirui Qi
Shan Chen
Zidi Xiong
Raquel Fernández
Danielle S. Bitterman
Arianna Bisazza
LRM
97
0
0
28 May 2025
Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments
Marc Feger
Katarina Boland
Stefan Dietze
36
0
0
28 May 2025
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Saleh Afzoon
Zahra Jahanandish
Phuong Thao Huynh
Amin Beheshti
Usman Naseem
54
0
0
28 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
64
0
0
28 May 2025
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
John Mendonça
A. Lavie
Isabel Trancoso
53
0
0
28 May 2025
Reinforced Reasoning for Embodied Planning
Reinforced Reasoning for Embodied Planning
Di Wu
Jiaxin Fan
Junzhe Zang
G. Wang
Wei Yin
Wenhao Li
Bo Jin
LRM
122
0
0
28 May 2025
Fostering Video Reasoning via Next-Event Prediction
Fostering Video Reasoning via Next-Event Prediction
Haonan Wang
Hongfu Liu
Xiangyan Liu
C. Du
Kenji Kawaguchi
Ye Wang
Tianyu Pang
AI4TSLRM
82
0
0
28 May 2025
Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment
Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment
Krti Tallam
Emma Miller
42
0
0
28 May 2025
ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation
ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation
Maja Stahl
Timon Ziegenbein
Joonsuk Park
Henning Wachsmuth
ALMLRM
36
0
0
28 May 2025
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Hanting Chen
Yasheng Wang
Kai Han
Dong Li
Lin Li
...
Hailin Hu
Yehui Tang
Dacheng Tao
Xinghao Chen
Yunhe Wang
LRM
98
0
0
28 May 2025
360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
360-LLaMA-Factory: Plug & Play Sequence Parallelism for Long Post-Training
Haosheng Zou
Xiaowei Lv
Shousheng Jia
Xiangzheng Zhang
SyDaLRM
36
0
0
28 May 2025
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
Xiaoqing Cheng
Ruizhe Chen
Hongying Zan
Yuxiang Jia
Min Peng
36
1
0
28 May 2025
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models
Zhiyu Li
Shichao Song
Hanyu Wang
Simin Niu
Ding Chen
...
Qingchen Yu
Bo Tang
Hongkang Yang
Zhi-hai Xu
Feiyu Xiong
RALM
38
0
0
28 May 2025
Stratified Selective Sampling for Instruction Tuning with Dedicated Scoring Strategy
Paramita Mirza
Lucas Weber
Fabian Küch
51
0
0
28 May 2025
EvolveSearch: An Iterative Self-Evolving Search Agent
EvolveSearch: An Iterative Self-Evolving Search Agent
Dingchu Zhang
Yida Zhao
Jialong Wu
Baixuan Li
Wenbiao Yin
...
Yong Jiang
Yufeng Li
Kewei Tu
Pengjun Xie
Fei Huang
LLMAGKELM
68
0
0
28 May 2025
Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO
Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R2^22)GRPO
Ran Li
Shimin Di
Yuchen Liu
Chen Jing
Yu Qiu
Lei Chen
LRM
79
0
0
28 May 2025
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Xiaomeng Yang
Zhiyu Tan
Junyan Wang
Zhijian Zhou
Hao Li
75
0
0
28 May 2025
Text2Grad: Reinforcement Learning from Natural Language Feedback
Text2Grad: Reinforcement Learning from Natural Language Feedback
Hanyang Wang
Lu Wang
Chaoyun Zhang
Tianjun Mao
Si Qin
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
80
0
0
28 May 2025
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
Feibo Jiang
Cunhua Pan
Li Dong
Kezhi Wang
O. Dobre
Mérouane Debbah
LLMAGAI4TS
175
1
0
28 May 2025
Zero-Shot 3D Visual Grounding from Vision-Language Models
Zero-Shot 3D Visual Grounding from Vision-Language Models
Rong Li
Shijie Li
Lingdong Kong
Xulei Yang
Junwei Liang
VGen
48
1
0
28 May 2025
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
OffRL
64
0
0
28 May 2025
Previous
123...678...126127128
Next