ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.05862
  4. Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott R. Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
ArXivPDFHTML

Papers citing "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

50 / 1,808 papers shown
Title
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
30
150
0
29 Sep 2023
LoRA ensembles for large language model fine-tuning
LoRA ensembles for large language model fine-tuning
Xi Wang
Laurence Aitchison
Maja Rudolph
UQCV
37
35
0
29 Sep 2023
Building Privacy-Preserving and Secure Geospatial Artificial
  Intelligence Foundation Models
Building Privacy-Preserving and Secure Geospatial Artificial Intelligence Foundation Models
Jinmeng Rao
Song Gao
Gengchen Mai
Joanna M. Wardlaw
37
20
0
29 Sep 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
108
1,622
0
28 Sep 2023
Language Models as a Service: Overview of a New Paradigm and its
  Challenges
Language Models as a Service: Overview of a New Paradigm and its Challenges
Emanuele La Malfa
Aleksandar Petrov
Simon Frieder
Christoph Weinhuber
Ryan Burnell
Raza Nazar
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
ALM
ELM
35
3
0
28 Sep 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
42
82
0
28 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
42
21
0
28 Sep 2023
Don't throw away your value model! Generating more preferable text with
  Value-Guided Monte-Carlo Tree Search decoding
Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding
Jiacheng Liu
Andrew Cohen
Ramakanth Pasunuru
Yejin Choi
Hannaneh Hajishirzi
Asli Celikyilmaz
26
24
0
26 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
29
179
0
26 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Aligning Large Multimodal Models with Factually Augmented RLHF
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
52
324
0
25 Sep 2023
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Yangjun Ruan
Honghua Dong
Andrew Wang
Silviu Pitis
Yongchao Zhou
Jimmy Ba
Yann Dubois
Chris J. Maddison
Tatsunori Hashimoto
LLMAG
ELM
25
98
0
25 Sep 2023
Can LLM-Generated Misinformation Be Detected?
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
41
159
0
25 Sep 2023
Creativity Support in the Age of Large Language Models: An Empirical
  Study Involving Emerging Writers
Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers
Tuhin Chakrabarty
Vishakh Padmakumar
Faeze Brahman
Smaranda Muresan
60
37
0
22 Sep 2023
AceGPT, Localizing Large Language Models in Arabic
AceGPT, Localizing Large Language Models in Arabic
Huang Huang
Fei Yu
Jianqing Zhu
Xuening Sun
Hao Cheng
...
Lian Zhang
Ruoyu Sun
Xiang Wan
Haizhou Li
Jinchao Xu
32
48
0
21 Sep 2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Tianle Li
Siyuan Zhuang
...
Zi Lin
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
38
181
0
21 Sep 2023
SCREWS: A Modular Framework for Reasoning with Revisions
SCREWS: A Modular Framework for Reasoning with Revisions
K. Shridhar
Harsh Jhamtani
Hao Fang
Benjamin Van Durme
Jason Eisner
Patrick Xia
KELM
LRM
30
14
0
20 Sep 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
27
233
0
20 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
48
9
0
20 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
31
22
0
20 Sep 2023
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text
  Updates
XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates
Haopeng Zhang
Hayate Iso
Sairam Gurajada
Nikita Bhutani
46
6
0
20 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
119
309
0
19 Sep 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Zenan Zhou
Zhiying Wu
ELM
LRM
77
712
0
19 Sep 2023
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
40
17
0
18 Sep 2023
SYNDICOM: Improving Conversational Commonsense with Error-Injection and
  Natural Language Feedback
SYNDICOM: Improving Conversational Commonsense with Error-Injection and Natural Language Feedback
Christopher Richardson
Anirudh S. Sundar
Larry Heck
LRM
30
4
0
18 Sep 2023
Exploring the impact of low-rank adaptation on the performance,
  efficiency, and regularization of RLHF
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Simeng Sun
Dhawal Gupta
Mohit Iyyer
29
17
0
16 Sep 2023
ICLEF: In-Context Learning with Expert Feedback for Explainable Style
  Transfer
ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer
Arkadiy Saakyan
Smaranda Muresan
31
3
0
15 Sep 2023
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language
  Models that Follow Instructions
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
Federico Bianchi
Mirac Suzgun
Giuseppe Attanasio
Paul Röttger
Dan Jurafsky
Tatsunori Hashimoto
James Zou
ALM
LM&MA
LRM
34
183
0
14 Sep 2023
ChatGPT v Bard v Bing v Claude 2 v Aria v human-expert. How good are AI
  chatbots at scientific writing?
ChatGPT v Bard v Bing v Claude 2 v Aria v human-expert. How good are AI chatbots at scientific writing?
Edisa Lozić
Benjamin Štular
42
30
0
14 Sep 2023
VerilogEval: Evaluating Large Language Models for Verilog Code
  Generation
VerilogEval: Evaluating Large Language Models for Verilog Code Generation
Mingjie Liu
N. Pinckney
Brucek Khailany
Haoxing Ren
29
157
0
14 Sep 2023
An Interactive Framework for Profiling News Media Sources
An Interactive Framework for Profiling News Media Sources
Nikhil Mehta
Dan Goldwasser
27
4
0
14 Sep 2023
Mitigate Replication and Copying in Diffusion Models with Generalized
  Caption and Dual Fusion Enhancement
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
Chenghao Li
Dake Chen
Yuke Zhang
Peter A. Beerel
DiffM
41
7
0
13 Sep 2023
RAIN: Your Language Models Can Align Themselves without Finetuning
RAIN: Your Language Models Can Align Themselves without Finetuning
Yuhui Li
Fangyun Wei
Jinjing Zhao
Chao Zhang
Hongyang R. Zhang
SILM
44
108
0
13 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
53
77
0
13 Sep 2023
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
  RL
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL
Hao Sun
Alihan Huyuk
M. Schaar
OffRL
LRM
23
28
0
13 Sep 2023
Statistical Rejection Sampling Improves Preference Optimization
Statistical Rejection Sampling Improves Preference Optimization
Tianqi Liu
Yao-Min Zhao
Rishabh Joshi
Misha Khalman
Mohammad Saleh
Peter J. Liu
Jialu Liu
66
215
0
13 Sep 2023
Mitigating the Alignment Tax of RLHF
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
34
69
0
12 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation
  Suite for Large Language Models
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
29
9
0
12 Sep 2023
Does Writing with Language Models Reduce Content Diversity?
Does Writing with Language Models Reduce Content Diversity?
Vishakh Padmakumar
He He
50
83
0
11 Sep 2023
Knowledge-tuning Large Language Models with Structured Medical Knowledge
  Bases for Reliable Response Generation in Chinese
Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese
Hao Wang
Sendong Zhao
Zewen Qiang
Zijian Li
Nuwa Xi
...
Haoqiang Guo
Yuhan Chen
Haoming Xu
Bing Qin
Ting Liu
LM&MA
AI4MH
34
17
0
08 Sep 2023
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
Patrick Haller
Ansar Aynetdinov
Alan Akbik
38
24
0
07 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
22
0
07 Sep 2023
Everyone Deserves A Reward: Learning Customized Human Preferences
Everyone Deserves A Reward: Learning Customized Human Preferences
Pengyu Cheng
Jiawen Xie
Ke Bai
Yong Dai
Nan Du
19
30
0
06 Sep 2023
Framework-Based Qualitative Analysis of Free Responses of Large Language
  Models: Algorithmic Fidelity
Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity
A. Amirova
T. Fteropoulli
Nafiso Ahmed
Martin R. Cowie
Joel Z. Leibo
26
5
0
06 Sep 2023
Deep Reinforcement Learning from Hierarchical Preference Design
Deep Reinforcement Learning from Hierarchical Preference Design
Alexander Bukharin
Yixiao Li
Pengcheng He
Tuo Zhao
25
0
0
06 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
39
30
0
05 Sep 2023
INTAGS: Interactive Agent-Guided Simulation
INTAGS: Interactive Agent-Guided Simulation
Song Wei
Andrea Coletta
Svitlana Vyetrenko
T. Balch
24
1
0
04 Sep 2023
Siren's Song in the AI Ocean: A Survey on Hallucination in Large
  Language Models
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Yue Zhang
Yafu Li
Leyang Cui
Deng Cai
Lemao Liu
...
Longyue Wang
Anh Tuan Luu
Wei Bi
Freda Shi
Shuming Shi
RALM
LRM
HILM
53
523
0
03 Sep 2023
Studying the impacts of pre-training using ChatGPT-generated text on
  downstream tasks
Studying the impacts of pre-training using ChatGPT-generated text on downstream tasks
Sarthak Anand
32
0
0
02 Sep 2023
Efficient RLHF: Reducing the Memory Usage of PPO
Efficient RLHF: Reducing the Memory Usage of PPO
Michael Santacroce
Yadong Lu
Han Yu
Yuan-Fang Li
Yelong Shen
35
27
0
01 Sep 2023
Let the Models Respond: Interpreting Language Model Detoxification
  Through the Lens of Prompt Dependence
Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence
Daniel Scalena
Gabriele Sarti
Malvina Nissim
Elisabetta Fersini
34
0
0
01 Sep 2023
Previous
123...303132...353637
Next