Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.05802
Cited By
v1
v2 (latest)
Self-critiquing models for assisting human evaluators
12 June 2022
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-critiquing models for assisting human evaluators"
38 / 238 papers shown
Title
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
195
584
0
06 Jun 2023
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
H. Wang
Ming Shan Hee
Rabiul Awal
K. T. W. Choo
Roy Ka-wei Lee
106
45
0
28 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
82
56
0
26 May 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
133
29
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
88
8
0
24 May 2023
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
Danqing Wang
Lei Li
80
8
0
23 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
156
608
0
22 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
156
399
0
19 May 2023
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
Afra Feyza Akyürek
Ekin Akyürek
Aman Madaan
Ashwin Kalyan
Peter Clark
Derry Wijaya
Niket Tandon
ALM
KELM
114
102
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
95
35
0
11 May 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
122
445
0
07 May 2023
An automatically discovered chain-of-thought prompt generalizes to novel models and datasets
Konstantin Hebenstreit
Robert Praas
Louis P. Kiesewetter
Matthias Samwald
LLMAG
LRM
AI4CE
ReLM
102
10
0
04 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
192
59
0
01 May 2023
Learning to Plan with Natural Language
Yiduo Guo
Yaobo Liang
Chenfei Wu
Wenshan Wu
Dongyan Zhao
Nan Duan
LLMAG
LRM
84
5
0
20 Apr 2023
Teaching Large Language Models to Self-Debug
Xinyun Chen
Maxwell Lin
Nathanael Scharli
Denny Zhou
LRM
164
711
0
11 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLM
LRM
134
182
0
04 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
103
117
0
02 Apr 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
186
521
0
31 Mar 2023
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
256
1,690
0
30 Mar 2023
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
170
374
0
30 Mar 2023
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
111
107
0
28 Mar 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
109
76
0
16 Feb 2023
Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
Justin Reppert
Ben Rachbach
Charlie George
Luke Stebbing
Ju-Seung Byun
Maggie Appleton
Andreas Stuhlmuller
ReLM
LRM
143
17
0
04 Jan 2023
Methodological reflections for AI alignment research using human feedback
Thilo Hagendorff
Sarah Fabi
78
6
0
22 Dec 2022
Large Language Models Meet NL2Code: A Survey
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELM
ALM
95
183
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
107
407
0
19 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
311
1,651
0
15 Dec 2022
Prompted Opinion Summarization with GPT-3.5
Adithya Bhaskar
Alexander R. Fabbri
Greg Durrett
ELM
65
56
0
29 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
113
132
0
04 Nov 2022
Generating Sequences by Learning to Self-Correct
Sean Welleck
Ximing Lu
Peter West
Faeze Brahman
T. Shen
Daniel Khashabi
Yejin Choi
LRM
118
238
0
31 Oct 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
Weiyan Shi
Emily Dinan
Kurt Shuster
Jason Weston
Jing Xu
116
20
0
28 Oct 2022
Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning
Louis Castricato
Alexander Havrilla
Shahbuland Matiana
Michael Pieler
Anbang Ye
Ian Yang
Spencer Frazier
Mark O. Riedl
95
13
0
14 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
329
538
0
28 Sep 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
136
412
0
26 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
147
193
0
30 Aug 2022
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
Kurt Shuster
Jing Xu
M. Komeili
Da Ju
Eric Michael Smith
...
Naman Goyal
Arthur Szlam
Y-Lan Boureau
Melanie Kambadur
Jason Weston
LM&Ro
KELM
128
243
0
05 Aug 2022
Language Model Cascades
David Dohan
Winnie Xu
Aitor Lewkowycz
Jacob Austin
David Bieber
...
Henryk Michalewski
Rif A. Saurous
Jascha Narain Sohl-Dickstein
Kevin Patrick Murphy
Charles Sutton
ReLM
LRM
130
102
0
21 Jul 2022
Language models show human-like content effects on reasoning tasks
Ishita Dasgupta
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Hannah R. Sheahan
Antonia Creswell
D. Kumaran
James L. McClelland
Felix Hill
ReLM
LRM
141
188
0
14 Jul 2022
Previous
1
2
3
4
5