ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,388 papers shown
Title
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLMOffRLLRM
111
18
0
10 Apr 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
208
1
0
09 Apr 2025
PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems
PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems
Anirudhan Badrinath
Prabhat Agarwal
Laksh Bhasin
Jaewon Yang
Jiajing Xu
Charles R. Rosenberg
LRM
122
1
0
09 Apr 2025
Perception in Reflection
Perception in Reflection
Yana Wei
Liang Zhao
Kangheng Lin
En Yu
Yuang Peng
...
Jianjian Sun
Haoran Wei
Zheng Ge
Xiangyu Zhang
Vishal M. Patel
135
1
0
09 Apr 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Yan Li
Di Zhang
Long Chen
MLLM
189
0
0
09 Apr 2025
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang
Fengqi Liu
Yixing Lu
Qin Zhao
Pingyu Wu
...
Ran Yi
Yang Cao
Lizhuang Ma
Zheng-jun Zha
Junting Dong
3DGS
101
1
0
09 Apr 2025
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
Jing Yao
Xiaoyuan Yi
Jindong Wang
Zhicheng Dou
Xing Xie
64
2
0
09 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
83
0
0
09 Apr 2025
Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions
Integrating Cognitive Processing Signals into Language Models: A Review of Advances, Applications and Future Directions
Angela Lopez-Cardona
Sebastian Idesis
Ioannis Arapakis
74
0
0
09 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Yashar Deldjoo
Nikhil Mehta
M. Sathiamoorthy
Shuai Zhang
Pablo Castells
Julian McAuley
EGVMELM
147
2
0
09 Apr 2025
AssistanceZero: Scalably Solving Assistance Games
AssistanceZero: Scalably Solving Assistance Games
Cassidy Laidlaw
Eli Bronstein
Timothy Guo
Dylan Feng
Lukas Berglund
Justin Svegliato
Stuart J. Russell
Anca Dragan
86
1
0
09 Apr 2025
Bridging the Gap Between Preference Alignment and Machine Unlearning
Bridging the Gap Between Preference Alignment and Machine Unlearning
Xiaohua Feng
Yuyuan Li
Huwei Ji
Jiaming Zhang
Lulu Zhang
Tianyu Du
Chaochao Chen
MU
95
0
0
09 Apr 2025
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning
Sanchit Kabra
Akshita Jha
Chandan K. Reddy
LRM
174
1
0
08 Apr 2025
FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction
FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction
Qian Zhang
Fang Li
Jie Wang
Lingfeng Qiao
Yifei Yu
Di Yin
Xingwu Sun
RALM
131
0
0
08 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Haoran Wang
Jiang Wu
Ting Liu
AAML
65
0
0
08 Apr 2025
Information-Theoretic Reward Decomposition for Generalizable RLHF
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao
Haoran Xu
Amy Zhang
Weinan Zhang
Chenjia Bai
119
0
0
08 Apr 2025
Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems
Chengzhi Lin
Annan Xie
Shuchang Liu
Wuhong Wang
Chuyuan Wang
Yongqi Liu
OffRL
61
0
0
08 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Ziyi Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
174
2
0
08 Apr 2025
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
On the Suitability of Reinforcement Fine-Tuning to Visual Tasks
X. Chen
Wei Li
Chunxu Liu
Chi Xie
Xiaoyan Hu
Chengqian Ma
Feng Zhu
Rui Zhao
ReLMLRM
159
2
0
08 Apr 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
C. Xu
Ming-Yu Liu
Peng Xu
Ziwei Liu
Wei Ping
Mohammad Shoeybi
Bo Li
Bryan Catanzaro
133
4
0
08 Apr 2025
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Monojit Choudhury
Shivam Chauhan
Rocktim Jyoti Das
Dhruv Sahnan
Xudong Han
...
Rituraj Joshi
Gurpreet Gosal
Avraham Sheinin
Natalia Vassilieva
Preslav Nakov
101
1
0
08 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge Belongie
Ryan Cotterell
Nico Lang
Stella Frank
92
2
0
07 Apr 2025
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling
Benjamin Lipkin
Benjamin LeBrun
Jacob Hoover Vigly
João Loula
David R. MacIver
...
Ryan Cotterell
Vikash K. Mansinghka
Timothy J. O'Donnell
Alexander K. Lew
Tim Vieira
94
0
0
07 Apr 2025
User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems
User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems
Jianling Wang
Yifan Liu
Yinghao Sun
Xuejian Ma
Yueqi Wang
...
Onkar Dalal
Ed Chi
Lichan Hong
Ningren Han
Haokai Lu
114
0
0
07 Apr 2025
Not All Data Are Unlearned Equally
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
406
2
0
07 Apr 2025
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu
W. Shi
Yuchen Zhuang
Yue Yu
Joyce C. Ho
Haoyu Wang
Carl Yang
66
3
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDaOffRLReLMLRM
196
11
0
07 Apr 2025
Generative Large Language Model usage in Smart Contract Vulnerability Detection
Generative Large Language Model usage in Smart Contract Vulnerability Detection
Peter Ince
Jiangshan Yu
Joseph K. Liu
Xiaoning Du
94
0
0
07 Apr 2025
LLM-based Automated Grading with Human-in-the-Loop
LLM-based Automated Grading with Human-in-the-Loop
Hang Li
Yucheng Chu
Kaiqi Yang
Yasemin Copur-Gencturk
Jiliang Tang
AI4EdELM
149
3
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALMLRM
118
4
0
07 Apr 2025
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Pedro Ferreira
Wilker Aziz
Ivan Titov
LRM
97
0
0
07 Apr 2025
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds
Qian Zuo
Fengxiang He
102
0
0
07 Apr 2025
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Anja Surina
Amin Mansouri
Lars Quaedvlieg
Amal Seddas
Maryna Viazovska
Emmanuel Abbe
Çağlar Gülçehre
120
3
0
07 Apr 2025
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
Yifei Yu
Qian Zhang
Lingfeng Qiao
Di Yin
Fang Li
Jie Wang
Zheyu Chen
Suncong Zheng
Xiaolong Liang
Xingwu Sun
107
0
0
07 Apr 2025
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
Tianshi Zheng
Yixiang Chen
Chengxi Li
Chunyang Li
Qing Zong
Haochen Shi
Baixuan Xu
Yangqiu Song
Ginny Wong
Simon See
LRM
118
5
0
07 Apr 2025
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models
Carlos Peláez-González
Andrés Herrera-Poyatos
Cristina Zuheros
David Herrera-Poyatos
Virilo Tejedor
F. Herrera
AAML
80
0
0
07 Apr 2025
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Kidist Amde Mekonnen
Yubao Tang
Maarten de Rijke
119
0
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
145
0
0
07 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGeLRM
75
0
0
07 Apr 2025
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
...
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
OffRLLRM
143
39
0
07 Apr 2025
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
Xuerui Su
Shufang Xie
Guoqing Liu
Yingce Xia
Renqian Luo
Peiran Jin
Zhiming Ma
Yue Wang
Zun Wang
Yuting Liu
LRM
101
5
0
06 Apr 2025
Dynamic Hedging Strategies in Derivatives Markets with LLM-Driven Sentiment and News Analytics
Dynamic Hedging Strategies in Derivatives Markets with LLM-Driven Sentiment and News Analytics
Jie Yang
Yiqiu Tang
Yongjie Li
L. Zhang
Haoyang Zhang
AIFin
61
0
0
05 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
105
0
0
05 Apr 2025
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Yunlong Lin
Zixu Lin
Haoyu Chen
Panwang Pan
C. Li
Sixiang Chen
Yeying Jin
Wenbo Li
Xinghao Ding
125
2
0
05 Apr 2025
FISH-Tuning: Enhancing PEFT Methods with Fisher Information
FISH-Tuning: Enhancing PEFT Methods with Fisher Information
Kang Xue
Ming Dong
Xinhui Tu
Tingting He
205
0
0
05 Apr 2025
A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models
A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models
Aviv Brokman
Xuguang Ai
Yuhang Jiang
Shashank Gupta
Ramakanth Kavuluru
SyDaLM&MA
69
1
0
05 Apr 2025
Cross-Asset Risk Management: Integrating LLMs for Real-Time Monitoring of Equity, Fixed Income, and Currency Markets
Cross-Asset Risk Management: Integrating LLMs for Real-Time Monitoring of Equity, Fixed Income, and Currency Markets
Jie Yang
Yiqiu Tang
Yongjie Li
L. Zhang
Haoyang Zhang
68
0
0
05 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making
Cognitive Debiasing Large Language Models for Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Zhongfu Chen
Zhaochun Ren
Maarten de Rijke
268
0
0
05 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
99
2
0
05 Apr 2025
Can ChatGPT Learn My Life From a Week of First-Person Video?
Can ChatGPT Learn My Life From a Week of First-Person Video?
Keegan Harris
43
0
0
04 Apr 2025
Previous
123...181920...126127128
Next