ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.05862
  4. Cited By
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

12 April 2022
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
Nova Dassarma
Dawn Drain
Stanislav Fort
Deep Ganguli
T. Henighan
Nicholas Joseph
Saurav Kadavath
John Kernion
Tom Conerly
S. E. Showk
Nelson Elhage
Zac Hatfield-Dodds
Danny Hernandez
Tristan Hume
Scott R. Johnston
Shauna Kravec
Liane Lovitt
Neel Nanda
Catherine Olsson
Dario Amodei
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
ArXivPDFHTML

Papers citing "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

50 / 1,806 papers shown
Title
Semantic Role Labeling Guided Out-of-distribution Detection
Semantic Role Labeling Guided Out-of-distribution Detection
Jinan Zou
Maihao Guo
Yu Tian
Yuhao Lin
Hai Cao
Lingqiao Liu
Ehsan Abbasnejad
Javen Qinfeng Shi
OODD
31
1
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in
  Vision-Language Models
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
34
22
0
29 May 2023
Taming AI Bots: Controllability of Neural States in Large Language
  Models
Taming AI Bots: Controllability of Neural States in Large Language Models
Stefano Soatto
Paulo Tabuada
Pratik Chaudhari
Tianwei Liu
LLMAG
LM&Ro
18
13
0
29 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
40
29
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
  Responses Created Through Human-Machine Collaboration
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice Oh
San-hee Park
Jung-Woo Ha
46
16
0
28 May 2023
Reward Collapse in Aligning Large Language Models
Reward Collapse in Aligning Large Language Models
Ziang Song
Tianle Cai
Jason D. Lee
Weijie J. Su
ALM
33
22
0
28 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language
  Models' Reasoning Performance
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
33
109
0
26 May 2023
CONA: A novel CONtext-Aware instruction paradigm for communication using
  large language model
CONA: A novel CONtext-Aware instruction paradigm for communication using large language model
Nan Zhou
Xinghui Tao
Xi Chen
21
0
0
26 May 2023
The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in
  Open-domain Conversational Question Answering
The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering
Sabrina Chiesurin
Dimitris Dimakopoulos
Marco Antonio Sobrevilla Cabezudo
Arash Eshghi
Ioannis V. Papaioannou
Verena Rieser
Ioannis Konstas
HILM
27
25
0
25 May 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
  Models
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan
Olivia Watkins
Yuqing Du
Hao Liu
Moonkyung Ryu
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
Kangwook Lee
Kimin Lee
53
137
0
25 May 2023
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu
Liangchen Luo
Jayakumar Hoskere
Yun Zhu
Canoee Liu
Simon Tong
Jindong Chen
Lei Meng
KELM
LRM
40
44
0
25 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
  Scores from Language Models Fine-Tuned with Human Feedback
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
66
291
0
24 May 2023
Provable Offline Preference-Based Reinforcement Learning
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
43
24
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
43
54
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference
  Judgments via GPT-4
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
31
12
0
24 May 2023
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts
Benfeng Xu
An Yang
Junyang Lin
Quang Wang
Chang Zhou
Yongdong Zhang
Zhendong Mao
ALM
47
133
0
24 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
73
2,373
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
86
611
0
23 May 2023
DetGPT: Detect What You Need via Reasoning
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Rui Pan
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRM
LM&Ro
27
92
0
23 May 2023
Learning from Mistakes via Cooperative Study Assistant for Large
  Language Models
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
Danqing Wang
Lei Li
37
6
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
81
67
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain
  Conversations with Large Language Models
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung Chen
30
91
0
23 May 2023
Training Priors Predict Text-To-Image Model Performance
Training Priors Predict Text-To-Image Model Performance
Charles Lovering
Ellie Pavlick
CoGe
43
3
0
23 May 2023
The Knowledge Alignment Problem: Bridging Human and External Knowledge
  for Large Language Models
The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models
Shuo Zhang
Liangming Pan
Junzhou Zhao
Wenjie Wang
HILM
28
0
0
23 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as
  Conversational Agents
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELM
ALM
LLMAG
32
32
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
44
320
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
45
549
0
22 May 2023
On the Limitations of Simulating Active Learning
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
31
11
0
21 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
  Critiquing
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
360
0
19 May 2023
Introspective Tips: Large Language Model for In-Context Decision Making
Introspective Tips: Large Language Model for In-Context Decision Making
Liting Chen
Lu Wang
Hang Dong
Yali Du
Jie Yan
...
Pu Zhao
Si Qin
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang
LLMAG
LRM
42
24
0
19 May 2023
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
  Language Models
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models
Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
43
8
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
52
83
0
19 May 2023
LIMA: Less Is More for Alignment
LIMA: Less Is More for Alignment
Chunting Zhou
Pengfei Liu
Puxin Xu
Srini Iyer
Jiao Sun
...
Susan Zhang
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
Omer Levy
ALM
36
783
0
18 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
40
273
0
17 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
128
1,152
0
17 May 2023
Prompt-Tuning Decision Transformer with Preference Ranking
Prompt-Tuning Decision Transformer with Preference Ranking
Shengchao Hu
Li Shen
Ya Zhang
Dacheng Tao
OffRL
34
14
0
16 May 2023
Make Prompt-based Black-Box Tuning Colorful: Boosting Model
  Generalization from Three Orthogonal Perspectives
Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives
Qiushi Sun
Chengcheng Han
Nuo Chen
Renyu Zhu
Jing Gong
Xiang Li
Ming Gao
VLM
27
8
0
14 May 2023
Leveraging Large Language Models in Conversational Recommender Systems
Leveraging Large Language Models in Conversational Recommender Systems
Luke Friedman
Sameer Ahuja
David Allen
Zhenning Tan
Hakim Sidahmed
...
Ajay Patel
Harsh Lara
Brian Chu
Zexiang Chen
Manoj Kumar Tiwari
32
104
0
13 May 2023
Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation
Is ChatGPT a Good Causal Reasoner? A Comprehensive Evaluation
Jin-Fang Gao
Xiao Ding
Bing Qin
Ting Liu
ELM
LRM
AI4MH
41
59
0
12 May 2023
Taking Advice from ChatGPT
Taking Advice from ChatGPT
Peter Zhang
45
5
0
11 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with
  Minimal Human Supervision
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
27
315
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
579
0
03 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image
  Generation
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
171
358
0
02 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
40
86
0
02 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
117
56
0
01 May 2023
Origin Tracing and Detecting of LLMs
Origin Tracing and Detecting of LLMs
Linyang Li
Pengyu Wang
Kerong Ren
Tianxiang Sun
Xipeng Qiu
LLMAG
94
31
0
27 Apr 2023
Towards ethical multimodal systems
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
40
3
0
26 Apr 2023
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework
Bin Wang
Xinnian Liang
Jian Yang
Huijia Huang
Shuangzhi Wu
Peihao Wu
Lu Lu
Zejun Ma
Zhoujun Li
LLMAG
KELM
RALM
98
26
0
26 Apr 2023
AGI: Artificial General Intelligence for Education
AGI: Artificial General Intelligence for Education
Ehsan Latif
Gengchen Mai
Matthew Nyaaba
Xuansheng Wu
Ninghao Liu
Guoyu Lu
Sheng Li
Tianming Liu
Xiaoming Zhai
ELM
AI4CE
40
23
0
24 Apr 2023
Previous
123...3334353637
Next