Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
91
140
0
26 Jul 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
115
8
0
26 Jul 2023
Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions
S. Sohail
Faiza Farhat
Yassine Himeur
Mohammad Nadeem
D. Madsen
Yashbir Singh
Shadi Atalla
W. Mansoor
106
124
0
26 Jul 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
Kevin Kaichuang Yang
Dan Klein
Asli Celikyilmaz
Nanyun Peng
Yuandong Tian
ALM
114
29
0
24 Jul 2023
On the Effectiveness of Offline RL for Dialogue Response Generation
Paloma Sodhi
Felix Wu
Ethan R. Elenberg
Kilian Q. Weinberger
Ryan T. McDonald
OffRL
84
5
0
23 Jul 2023
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors
Kolby Nottingham
Yasaman Razeghi
Kyungmin Kim
JB Lanier
Pierre Baldi
Roy Fox
Sameer Singh
92
10
0
21 Jul 2023
Kernelized Offline Contextual Dueling Bandits
Viraj Mehta
Ojash Neopane
Vikramjeet Das
Sen Lin
J. Schneider
Willie Neiswanger
OffRL
78
4
0
21 Jul 2023
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
Shriyash Upadhyay
Etan Ginsberg
SyDa
LRM
62
0
0
20 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
Ashutosh Singh
Prateek R. Agarwal
Zixuan Huang
Arpita Singh
...
Ryan Rossi
Puneet Mathur
Erik Learned-Miller
Franck Dernoncourt
Ryan Rossi
108
8
0
20 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
595
12,141
0
18 Jul 2023
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models
Huachuan Qiu
Shuai Zhang
Anqi Li
Hongliang He
Zhenzhong Lan
ALM
110
53
0
17 Jul 2023
On the application of Large Language Models for language teaching and assessment technology
Andrew Caines
Luca Benedetto
Shiva Taslimipoor
Christopher Davis
Yuan Gao
...
Marek Rei
H. Yannakoudakis
Andrew Mullooly
D. Nicholls
P. Buttery
ELM
77
48
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLM
LRM
82
194
0
17 Jul 2023
Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems
Shivani Kumar
S. Bhatia
Milan Aggarwal
Tanmoy Chakraborty
99
1
0
14 Jul 2023
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALM
OffRL
132
177
0
11 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
100
506
0
10 Jul 2023
Improving Factuality of Abstractive Summarization via Contrastive Reward Learning
Ethan Chern
Zhiruo Wang
Sanjan Das
Bhavuk Sharma
Pengfei Liu
Graham Neubig
HILM
83
14
0
10 Jul 2023
TIM: Teaching Large Language Models to Translate with Comparison
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
118
57
0
10 Jul 2023
Advancements in Scientific Controllable Text Generation Methods
Arnav Goel
Medha Hira
Avinash Anand
Siddhesh Bangar
R. Shah
82
7
0
08 Jul 2023
Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback
Yu Chen
Yihan Du
Pihe Hu
Si-Yi Wang
De-hui Wu
Longbo Huang
88
8
0
06 Jul 2023
Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback
Taeho Yoon
Kibeom Myoung
Keon Lee
Jaewoong Cho
Albert No
Ernest K. Ryu
92
8
0
06 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
249
1,006
0
05 Jul 2023
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
99
88
0
04 Jul 2023
The Inner Sentiments of a Thought
Christian Gagné
Peter Dayan
61
4
0
04 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
160
4
0
01 Jul 2023
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
97
272
0
30 Jun 2023
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
147
99
0
28 Jun 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
156
246
0
28 Jun 2023
System-Level Natural Language Feedback
Weizhe Yuan
Kyunghyun Cho
Jason Weston
119
5
0
23 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
154
615
0
23 Jun 2023
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T. Small
Ivan Vendrov
Esin Durmus
Hadjar Homaei
Elizabeth Barry
Julien Cornebise
Ted Suzman
Deep Ganguli
Colin Megill
101
30
0
20 Jun 2023
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
82
49
0
20 Jun 2023
Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning
Huiguo He
Tianfu Wang
Huan Yang
Jianlong Fu
N. Yuan
Jian Yin
Hongyang Chao
Qi Zhang
EGVM
160
10
0
20 Jun 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model
Jiaan Wang
Jianfeng Qu
Yunlong Liang
Zhixu Li
An Liu
Guanfeng Liu
Xin Zheng
87
2
0
17 Jun 2023
Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback
Shenghuan Sun
Gregory M. Goldgof
A. Butte
Ahmed Alaa
MedIm
78
14
0
16 Jun 2023
Fairness in Preference-based Reinforcement Learning
Umer Siddique
Abhinav Sinha
Yongcan Cao
69
5
0
16 Jun 2023
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation
Guangyu Wang
Guoxing Yang
Zongxin Du
Longjun Fan
Xiaohu Li
LM&MA
ELM
AI4MH
74
88
0
16 Jun 2023
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
110
140
0
15 Jun 2023
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
Xiao Liu
Hanyu Lai
Hao Yu
Yifan Xu
Aohan Zeng
Zhengxiao Du
Peng Zhang
Yuxiao Dong
Jie Tang
80
105
0
13 Jun 2023
A Markovian Formalism for Active Querying
Sid Ijju
63
1
0
13 Jun 2023
When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset
Jiaxin Pei
David Jurgens
78
34
0
12 Jun 2023
Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering
Hai Ye
Qizhe Xie
Hwee Tou Ng
90
8
0
11 Jun 2023
Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording
Aisha Khatun
Daniel Brown
KELM
61
12
0
09 Jun 2023
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?
Wissam Antoun
Virginie Mouilleron
Benoît Sagot
Djamé Seddah
DeLMO
85
33
0
09 Jun 2023
Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning
Jaehyung Kim
Jinwoo Shin
Dongyeop Kang
69
2
0
08 Jun 2023
Absformer: Transformer-based Model for Unsupervised Multi-Document Abstractive Summarization
M. Trabelsi
H. Uzunalioglu
81
2
0
07 Jun 2023
Cross-attention learning enables real-time nonuniform rotational distortion correction in OCT
Haoran Zhang
Jianlong Yang
Jingqian Zhang
Shiqing Zhao
Aili Zhang
392
8
0
07 Jun 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
123
158
0
07 Jun 2023
GPT Self-Supervision for a Better Data Annotator
Xiaohuan Pei
Yanxi Li
Chang Xu
70
7
0
07 Jun 2023
PokemonChat: Auditing ChatGPT for Pokémon Universe Knowledge
Laura Cabello
Jiaang Li
Ilias Chalkidis
ELM
AI4MH
LRM
41
2
0
05 Jun 2023
Previous
1
2
3
...
24
25
26
...
29
30
31
Next