ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,397 papers shown
Title
Scopes of Alignment
Scopes of Alignment
Kush R. Varshney
Zahra Ashktorab
Djallel Bouneffouf
Matthew D Riemer
Justin D. Weisz
83
0
0
15 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
165
0
0
14 Jan 2025
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
CWEval: Outcome-driven Evaluation on Functionality and Security of LLM Code Generation
Jinjun Peng
Leyi Cui
Kele Huang
Junfeng Yang
Baishakhi Ray
ELM
143
13
0
14 Jan 2025
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Yaowen Ye
Cassidy Laidlaw
Jacob Steinhardt
ALM
84
2
0
14 Jan 2025
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding
Haomiao Xiong
Yunzhi Zhuge
Jiawen Zhu
Lu Zhang
Huchuan Lu
86
3
0
14 Jan 2025
WebWalker: Benchmarking LLMs in Web Traversal
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu
Wenbiao Yin
Yong Jiang
Zhenglin Wang
Zekun Xi
...
Linhai Zhang
Yulan He
Deyu Zhou
Pengjun Xie
Fei Huang
128
14
0
13 Jan 2025
Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning
Improving DeFi Accessibility through Efficient Liquidity Provisioning with Deep Reinforcement Learning
Haonan Xu
Alessio Brini
68
3
0
13 Jan 2025
Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications
Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications
Sze Ming Lee
Yunxiao Chen
95
0
0
13 Jan 2025
PoAct: Policy and Action Dual-Control Agent for Generalized Applications
PoAct: Policy and Action Dual-Control Agent for Generalized Applications
Guozhi Yuan
Yang Liu
Jingli Yang
Wei Jia
Kai Lin
Yansong Gao
Shan He
Zilin Ding
Haoyang Li
LLMAG
67
0
0
13 Jan 2025
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives
Xinyao Ma
Rui Zhu
Zihao Wang
Jingwei Xiong
Qingyu Chen
Haixu Tang
L. Jean Camp
Lucila Ohno-Machado
LM&MA
95
0
0
12 Jan 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee
Jongha Kim
Jeehye Na
Jinyoung Park
H. Kim
VGen
58
2
0
12 Jan 2025
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR)
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR)
Stephanie Eckman
Bolei Ma
Christoph Kern
Rob Chew
Yun Xue
Frauke Kreuter
96
0
0
12 Jan 2025
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing
Avinab Saha
Junfeng He
Susan Hao
Paul Vicol
...
Sahil Singla
Sarah Young
Yinxiao Li
Feng Yang
Deepak Ramachandran
DiffM
125
1
0
11 Jan 2025
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
Tong Liu
Xiao Yu
Wenxuan Zhou
Jindong Gu
Volker Tresp
82
1
0
11 Jan 2025
On the Partial Identifiability in Reward Learning: Choosing the Best Reward
On the Partial Identifiability in Reward Learning: Choosing the Best Reward
Filippo Lazzati
Alberto Maria Metelli
72
0
0
10 Jan 2025
On The Statistical Complexity of Offline Decision-Making
On The Statistical Complexity of Offline Decision-Making
Thanh Nguyen-Tang
R. Arora
OffRL
227
1
0
10 Jan 2025
Safeguarding System Prompts for LLMs
Safeguarding System Prompts for LLMs
Zhifeng Jiang
Zhihua Jin
Guoliang He
AAMLSILM
183
2
0
10 Jan 2025
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Tongshuang Wu
Haiyi Zhu
Maya Albayrak
Alexis Axon
Amanda Bertsch
...
Ying-Jui Tseng
Patricia Vaidos
Zhijin Wu
Wei Wu
Chenyang Yang
182
34
0
10 Jan 2025
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
Muhammad Awais
Ali Husain Salem Abdulla Alharthi
Amandeep Kumar
Hisham Cholakkal
Rao Muhammad Anwer
VLM
128
5
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
91
6
0
08 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Lefei Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRMSyDaReLM
154
133
0
08 Jan 2025
Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions
Integrating LLMs with ITS: Recent Advances, Potentials, Challenges, and Future Directions
Doaa Mahmud
Hadeel Hajmohamed
Shamma Almentheri
Shamma Alqaydi
Lameya Aldhaheri
R. A. Khalil
Nasir Saeed
AI4TS
111
12
0
08 Jan 2025
Predictable Artificial Intelligence
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
150
3
0
08 Jan 2025
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
Yi Zhang
Guangyou Zhou
Zhiwen Xie
Jinjin Ma
Jimmy Xiangji Huang
AIMat
72
4
0
08 Jan 2025
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models
Hao Li
Cor-Paul Bezemer
Ahmed E. Hassan
82
4
0
08 Jan 2025
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
Jie Cao
Dian Jiao
Qiang Yan
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
90
1
0
08 Jan 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
90
1
0
08 Jan 2025
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Patrice Béchard
Orlando Marquez Ayala
118
0
0
08 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
133
11
0
08 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
137
2
0
07 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yang Liu
130
0
0
06 Jan 2025
Improving GenIR Systems Based on User Feedback
Qingyao Ai
Zhicheng Dou
Min Zhang
422
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
179
15
0
06 Jan 2025
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Stefan Pasch
177
0
0
04 Jan 2025
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Yachao Zhao
Bo Wang
Yan Wang
Dongming Zhao
Ruifang He
Yuexian Hou
154
4
0
04 Jan 2025
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
135
19
0
03 Jan 2025
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
Shengbin Yue
Siyuan Wang
Wei Chen
Xuanjing Huang
Zhongyu Wei
LLMAG
163
11
0
03 Jan 2025
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Benjamin Laufer
Jon M. Kleinberg
Hoda Heidari
161
11
0
03 Jan 2025
Enhancing Preference-based Linear Bandits via Human Response Time
Enhancing Preference-based Linear Bandits via Human Response Time
Shen Li
Yuyang Zhang
Tongzheng Ren
Claire Liang
Na Li
J. Shah
183
1
0
03 Jan 2025
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
117
10
0
03 Jan 2025
CREW: Facilitating Human-AI Teaming Research
CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang
Zhengran Ji
Boyuan Chen
140
4
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAGALM
194
102
0
03 Jan 2025
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web
Hiroki Furuta
Yutaka Matsuo
Aleksandra Faust
Izzeddin Gur
CLL
224
16
0
03 Jan 2025
Decoding Knowledge in Large Language Models: A Framework for Categorization and Comprehension
Yanbo Fang
Ruixiang Tang
ELM
83
0
0
03 Jan 2025
Text Clustering as Classification with LLMs
Text Clustering as Classification with LLMs
Chen Huang
Guoxiu He
125
4
0
03 Jan 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Shuangtao Li
Shuaihao Dong
Kexin Luan
Xinhan Di
Chaofan Ding
LRM
116
4
0
02 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRLELM
471
0
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
288
5
0
31 Dec 2024
Natural Language Fine-Tuning
Natural Language Fine-Tuning
Qingbin Liu
Yue Wang
Zhiqi Lin
Min Chen
Yixue Hao
Long Hu
96
1
0
31 Dec 2024
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
Weiwei Sun
Lingyong Yan
Xinyu Ma
Shuaiqiang Wang
Fajie Yuan
Zhumin Chen
D. Yin
Zhaochun Ren
RALMALMELMLRMLM&MA
236
315
0
31 Dec 2024
Previous
123...333435...126127128
Next