ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02155
  4. Cited By
Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

4 March 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
Pamela Mishkin
Chong Zhang
Sandhini Agarwal
Katarina Slama
Alex Ray
John Schulman
Jacob Hilton
Fraser Kelton
Luke E. Miller
Maddie Simens
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
    OSLMALM
ArXiv (abs)PDFHTML

Papers citing "Training language models to follow instructions with human feedback"

50 / 6,399 papers shown
Title
Model-based Preference Optimization in Abstractive Summarization without
  Human Feedback
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi
Kyubyung Chae
Jiwoo Song
Yohan Jo
Taesup Kim
70
2
0
27 Sep 2024
Data Analysis in the Era of Generative AI
Data Analysis in the Era of Generative AI
J. Inala
Chenglong Wang
Steven Drucker
Gonzalo Ramos
Victor C. Dibia
N. Riche
Dave Brown
Dan Marshall
Jianfeng Gao
102
9
0
27 Sep 2024
VickreyFeedback: Cost-efficient Data Construction for Reinforcement
  Learning from Human Feedback
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
Guoxi Zhang
Jiuding Duan
71
1
0
27 Sep 2024
Multimodal Pragmatic Jailbreak on Text-to-image Models
Multimodal Pragmatic Jailbreak on Text-to-image Models
Tong Liu
Zhixin Lai
Jiawen Wang
Gengyuan Zhang
Shuo Chen
Philip Torr
Vera Demberg
Volker Tresp
Jindong Gu
77
5
0
27 Sep 2024
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation
Yuki Imajuku
Yoko Yamakata
Kiyoharu Aizawa
88
1
0
27 Sep 2024
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
Zhenghao Peng
Wenjie Luo
Yiren Lu
Tianyi Shen
Cole Gulino
Ari Seff
Justin Fu
62
9
0
26 Sep 2024
DisGeM: Distractor Generation for Multiple Choice Questions with Span
  Masking
DisGeM: Distractor Generation for Multiple Choice Questions with Span Masking
Devrim Cavusoglu
Secil Sen
Ulas Sert
63
0
0
26 Sep 2024
AI Policy Projector: Grounding LLM Policy Design in Iterative Mapmaking
AI Policy Projector: Grounding LLM Policy Design in Iterative Mapmaking
Michelle S. Lam
Fred Hohman
Dominik Moritz
Jeffrey P. Bigham
Kenneth Holstein
Mary Beth Kery
78
1
0
26 Sep 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A
  Survey
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
144
47
0
26 Sep 2024
Graph Reasoning with Large Language Models via Pseudo-code Prompting
Graph Reasoning with Large Language Models via Pseudo-code Prompting
Konstantinos Skianis
Giannis Nikolentzos
Michalis Vazirgiannis
LRMReLM
81
5
0
26 Sep 2024
Learning to Love Edge Cases in Formative Math Assessment: Using the
  AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy
Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy
Owen Henkel
Hannah Horne-Robinson
Maria Dyshel
Nabil Ch
Baptiste Moreau-Pernet
Ralph Abood
77
0
0
26 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on
  GPU Tensor Cores
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
101
4
0
26 Sep 2024
Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric
  Retrieval Model
Few-shot Prompting for Pairwise Ranking: An Effective Non-Parametric Retrieval Model
Nilanjan Sinhababu
Andrew Parry
Debasis Ganguly
D. Samanta
Pabitra Mitra
109
4
0
26 Sep 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard
  for Prompt Attacks
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
Giandomenico Cornacchia
Giulio Zizzo
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Mark Purcell
75
3
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
  Refine the Difficult
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
61
0
0
26 Sep 2024
Just Say What You Want: Only-prompting Self-rewarding Online Preference
  Optimization
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
Ruijie Xu
Zhihan Liu
Yongfei Liu
Shipeng Yan
Zhaoran Wang
Zhi-Li Zhang
Xuming He
ALM
88
1
0
26 Sep 2024
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with
  Scoring-aware Multiple Rewards
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards
Heejin Do
Sangwon Ryu
Gary Geunbae Lee
83
2
0
26 Sep 2024
Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut
  Learning in Text Classification by Language Models
Navigating the Shortcut Maze: A Comprehensive Analysis of Shortcut Learning in Text Classification by Language Models
Yuqing Zhou
Ruixiang Tang
Ziyu Yao
Ziwei Zhu
109
4
0
26 Sep 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
72
2
0
26 Sep 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu
Wenjiang Xu
Weizhe Huang
Yuting Zeng
Jiaxing Wang
Hailong Yang
Hailong Yang
Jing Li
LRMReLM
131
10
0
26 Sep 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MUAAML
220
53
0
26 Sep 2024
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure
Xi Chen
Zhiyang Zhang
Fangkai Yang
Xiaoting Qin
Chao Du
Xi Cheng
Hangxin Liu
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
48
1
0
26 Sep 2024
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Enhancing elusive clues in knowledge learning by contrasting attention of language models
Jian Gao
Xiao Zhang
Ji Wu
Miao Li
112
0
0
26 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
94
6
0
25 Sep 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused
  Policies
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
90
5
0
25 Sep 2024
Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision,
  Physics Simulation, and a Robot with Reset
Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset
Andrew Goldberg
Kavish Kondap
Tianshuang Qiu
Zehan Ma
Letian Fu
Justin Kerr
Huang Huang
Kaiyuan Chen
Kuan Fang
Ken Goldberg
79
4
0
25 Sep 2024
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
P Aditya Sreekar
Sahil Verma
Suransh Chopra
Sarik Ghazarian
Abhishek Persad
Narayanan Sadagopan
LRM
54
1
0
25 Sep 2024
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM
  Personalization
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization
Rafael Mendoza
Isabella Cruz
Richard Liu
Aarav Deshmukh
David Williams
Jesscia Peng
Rohan Iyer
87
1
0
25 Sep 2024
Pruning Multilingual Large Language Models for Multilingual Inference
Pruning Multilingual Large Language Models for Multilingual Inference
Hwichan Kim
Jun Suzuki
Tosho Hirasawa
Mamoru Komachi
98
0
0
25 Sep 2024
Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question
  Answering
Enhancing Temporal Sensitivity and Reasoning for Time-Sensitive Question Answering
Wanqi Yang
Yanda Li
Meng Fang
Ling Chen
96
8
0
25 Sep 2024
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ
Marc-Antoine Allard
Matin Ansaripour
Maria Yuffa
Paul Teiletche
LRM
42
0
0
25 Sep 2024
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL
Hasan Alp Caferoğlu
Özgür Ulusoy
131
22
0
25 Sep 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
157
4
0
25 Sep 2024
Unsupervised Text Representation Learning via Instruction-Tuning for
  Zero-Shot Dense Retrieval
Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval
Qiuhai Zeng
Zimeng Qiu
Dae Yon Hwang
Xin He
William M. Campbell
RALM
52
0
0
24 Sep 2024
CJEval: A Benchmark for Assessing Large Language Models Using Chinese
  Junior High School Exam Data
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Qian-Wen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
ELMAI4Ed
69
0
0
24 Sep 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large
  Language Models
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Haoran Que
Feiyu Duan
Liqun He
Yutao Mou
Wangchunshu Zhou
...
Ge Zhang
Junran Peng
Zhaoxiang Zhang
Songyang Zhang
Kai Chen
LM&MAELMVLM
106
16
0
24 Sep 2024
Controlling Risk of Retrieval-augmented Generation: A Counterfactual
  Prompting Framework
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework
Lu Chen
Ruqing Zhang
Jiafeng Guo
Yixing Fan
Xueqi Cheng
56
5
0
24 Sep 2024
Finetuning LLMs for Comparative Assessment Tasks
Finetuning LLMs for Comparative Assessment Tasks
Vatsal Raina
Adian Liusie
Mark Gales
77
1
0
24 Sep 2024
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For
  Asthma Patient Support
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Support
Adil Bahaj
Mounir Ghogho
139
2
0
24 Sep 2024
SYNERGAI: Perception Alignment for Human-Robot Collaboration
SYNERGAI: Perception Alignment for Human-Robot Collaboration
Yixin Chen
Guoxi Zhang
Yaowei Zhang
Hongming Xu
Peiyuan Zhi
Qing Li
Siyuan Huang
75
0
0
24 Sep 2024
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
M2^22PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLMVLMLRM
105
19
0
24 Sep 2024
Steward: Natural Language Web Automation
Steward: Natural Language Web Automation
Brian Tang
Kang G. Shin
LLMAG
66
1
0
23 Sep 2024
GenAI Advertising: Risks of Personalizing Ads with LLMs
GenAI Advertising: Risks of Personalizing Ads with LLMs
Brian Tang
Kaiwen Sun
Noah T. Curran
F. Schaub
Kang G. Shin
SILM
70
2
0
23 Sep 2024
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Yunfei Xie
Juncheng Wu
Haoqin Tu
Siwei Yang
Bingchen Zhao
Yongshuo Zong
Qiao Jin
Cihang Xie
Yuyin Zhou
LM&MAELMLRM
114
26
0
23 Sep 2024
RMCBench: Benchmarking Large Language Models' Resistance to Malicious
  Code
RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
Jiachi Chen
Qingyuan Zhong
Yanlin Wang
Kaiwen Ning
Yongkun Liu
Zenan Xu
Zhe Zhao
Ting Chen
Zibin Zheng
AAML
40
9
0
23 Sep 2024
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey
  on How to Make your LLMs use External Data More Wisely
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
Siyun Zhao
Yuqing Yang
Zilong Wang
Zhiyuan He
Luna Qiu
Lili Qiu
SyDaRALM3DV
122
42
0
23 Sep 2024
With Ears to See and Eyes to Hear: Sound Symbolism Experiments with
  Multimodal Large Language Models
With Ears to See and Eyes to Hear: Sound Symbolism Experiments with Multimodal Large Language Models
Tyler Loakman
Yucheng Li
Chenghua Lin
VLM
56
1
0
23 Sep 2024
Orthogonal Finetuning for Direct Preference Optimization
Orthogonal Finetuning for Direct Preference Optimization
Chenxu Yang
Ruipeng Jia
Naibin Gu
Zheng Lin
Siyuan Chen
Chao Pang
Weichong Yin
Yu Sun
Hua Wu
Weiping Wang
90
0
0
23 Sep 2024
Phantom of Latent for Large Language and Vision Models
Phantom of Latent for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
VLMLRM
103
7
0
23 Sep 2024
ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
Yihong Tang
Jiao Ou
Che Liu
Fuzheng Zhang
Di Zhang
Kun Gai
79
2
0
23 Sep 2024
Previous
123...495051...126127128
Next