Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.14375
Cited By
Improving alignment of dialogue agents via targeted human judgements
28 September 2022
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
Timo Ewalds
Maribeth Rauh
Laura Weidinger
Martin Chadwick
Phoebe Thacker
Lucy Campbell-Gillingham
J. Uesato
Po-Sen Huang
Ramona Comanescu
Fan Yang
A. See
Sumanth Dathathri
Rory Greig
Charlie Chen
Doug Fritz
Jaume Sanchez Elias
Richard Green
Sovna Mokrá
Nicholas Fernando
Boxi Wu
Rachel Foley
Susannah Young
Iason Gabriel
William S. Isaac
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Improving alignment of dialogue agents via targeted human judgements"
50 / 117 papers shown
Title
RLHF and IIA: Perverse Incentives
Wanqiao Xu
Shi Dong
Xiuyuan Lu
Grace Lam
Zheng Wen
Benjamin Van Roy
86
2
0
02 Dec 2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Bertie Vidgen
Nino Scherrer
Hannah Rose Kirk
Rebecca Qian
Anand Kannappan
Scott A. Hale
Paul Röttger
ALM
ELM
120
29
0
14 Nov 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRL
LLMAG
93
29
0
09 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
106
9
0
09 Nov 2023
Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings
Elena Sergeeva
Anastasia Sergeeva
Huiyun Tang
Kerstin Bongard-Blanchy
Peter Szolovits
72
1
0
22 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
373
247
0
20 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
John Dang
Aditya Grover
83
30
0
17 Oct 2023
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
97
20
0
17 Oct 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
113
152
0
16 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Songlin Yang
Gabriele Farina
Jacob Andreas
95
23
0
13 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
119
50
0
11 Oct 2023
Parameter Efficient Multi-task Model Fusion with Partial Linearization
Anke Tang
Li Shen
Yong Luo
Yibing Zhan
Han Hu
Bo Du
Yixin Chen
Dacheng Tao
MoMe
124
36
0
07 Oct 2023
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Alexander Robey
Eric Wong
Hamed Hassani
George J. Pappas
AAML
207
260
0
05 Oct 2023
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
128
177
0
29 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
145
78
0
21 Sep 2023
FLM-101B: An Open LLM and How to Train It with
100
K
B
u
d
g
e
t
100K Budget
100
K
B
u
d
g
e
t
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng Zhang
Aixin Sun
Yequan Wang
158
22
0
07 Sep 2023
Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity
A. Amirova
T. Fteropoulli
Nafiso Ahmed
Martin R. Cowie
Joel Z Leibo
89
11
0
06 Sep 2023
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
Soheil Feizi
Himabindu Lakkaraju
AAML
164
197
0
06 Sep 2023
Leveraging Contextual Counterfactuals Toward Belief Calibration
Qiuyi Zhang
Zhang
Michael S. Lee
Sherol Chen
65
1
0
13 Jul 2023
SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions
Sameera Horawalavithana
Sai Munikoti
Ian Stewart
Henry Kvinge
MLLM
93
19
0
03 Jul 2023
System-Level Natural Language Feedback
Weizhe Yuan
Kyunghyun Cho
Jason Weston
119
5
0
23 Jun 2023
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation
Harel Biggie
Ajay Narasimha Mopidevi
Dusty Woods
Christoffer Heckman
LM&Ro
74
11
0
15 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
168
336
0
02 Jun 2023
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?
Michael Heck
Nurul Lubis
Benjamin Ruppik
Renato Vukovic
Shutong Feng
Christian Geishauser
Hsien-chin Lin
Carel van Niekerk
Milica Gavsić
127
47
0
02 Jun 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
109
60
0
29 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
80
115
0
26 May 2023
The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering
Sabrina Chiesurin
Dimitris Dimakopoulos
Marco Antonio Sobrevilla Cabezudo
Arash Eshghi
Ioannis V. Papaioannou
Verena Rieser
Ioannis Konstas
HILM
69
28
0
25 May 2023
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
102
78
0
25 May 2023
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
155
70
0
23 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
156
608
0
22 May 2023
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
90
11
0
21 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
156
399
0
19 May 2023
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models
Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
85
8
0
19 May 2023
SWAN: A Generic Framework for Auditing Textual Conversational Systems
T. Sakai
47
9
0
15 May 2023
A Glimpse in ChatGPT Capabilities and its impact for AI research
Frank Joublin
Antonello Ceravola
Joerg Deigmoeller
Michael Gienger
M. Franzius
Julian Eggert
SILM
AI4MH
ALM
ELM
75
15
0
10 May 2023
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
90
3
0
27 Apr 2023
Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
Yunjie Ji
Yan Gong
Yong Deng
Yiping Peng
Qiang Niu
Baochang Ma
Xiangang Li
ALM
ELM
107
25
0
16 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
162
470
0
13 Apr 2023
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
J. Holmes
Zheng Liu
Hua Zhou
Yuzhen Ding
Terence T. Sio
...
Jonathan B. Ashman
Xiang Li
Tianming Liu
Jiajian Shen
Wen Liu
LM&MA
AI4CE
ELM
94
124
0
01 Apr 2023
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
Zheng-Long Liu
Yue Huang
Xiao-Xing Yu
Lu Zhang
Zihao Wu
...
Dinggang Shen
Quanzheng Li
Tianming Liu
Dajiang Zhu
Xiang Li
LM&MA
MedIm
129
179
0
20 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
108
107
0
09 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
120
554
0
07 Mar 2023
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
85
41
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
111
76
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
92
171
0
15 Feb 2023
The Programmer's Assistant: Conversational Interaction with a Large Language Model for Software Development
Steven I. Ross
Fernando Martinez
Stephanie Houde
Michael J. Muller
Justin D. Weisz
106
227
0
14 Feb 2023
Transformer models: an introduction and catalog
X. Amatriain
Ananth Sankar
Jie Bing
Praveen Kumar Bodigutla
Timothy J. Hazen
Michaeel Kazi
146
53
0
12 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
107
253
0
11 Feb 2023
Regulating ChatGPT and other Large Generative AI Models
P. Hacker
A. Engel
M. Mauer
AILaw
189
354
0
05 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
127
40
0
02 Feb 2023
Previous
1
2
3
Next