ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.14375
  4. Cited By
Improving alignment of dialogue agents via targeted human judgements

Improving alignment of dialogue agents via targeted human judgements

28 September 2022
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
Timo Ewalds
Maribeth Rauh
Laura Weidinger
Martin Chadwick
Phoebe Thacker
Lucy Campbell-Gillingham
J. Uesato
Po-Sen Huang
Ramona Comanescu
Fan Yang
A. See
Sumanth Dathathri
Rory Greig
Charlie Chen
Doug Fritz
Jaume Sanchez Elias
Richard Green
Sovna Mokrá
Nicholas Fernando
Boxi Wu
Rachel Foley
Susannah Young
Iason Gabriel
William S. Isaac
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
    ALM
    AAML
ArXivPDFHTML

Papers citing "Improving alignment of dialogue agents via targeted human judgements"

50 / 112 papers shown
Title
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
221
198
0
20 Oct 2023
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Who Are All The Stochastic Parrots Imitating? They Should Tell Us!
Sagi Shaier
Lawrence E Hunter
K. Wense
36
3
0
16 Oct 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications
  with Programmable Rails
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
21
133
0
16 Oct 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Aligning Language Models with Human Preferences via a Bayesian Approach
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
40
22
0
09 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
T. Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
39
48
0
06 Oct 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
42
82
0
28 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
40
21
0
28 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
43
66
0
21 Sep 2023
FLM-101B: An Open LLM and How to Train It with $100K Budget
FLM-101B: An Open LLM and How to Train It with 100KBudget100K Budget100KBudget
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
Li Du
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
Certifying LLM Safety against Adversarial Prompting
Certifying LLM Safety against Adversarial Prompting
Aounon Kumar
Chirag Agarwal
Suraj Srinivas
Aaron Jiaxun Li
S. Feizi
Himabindu Lakkaraju
AAML
27
165
0
06 Sep 2023
Simple synthetic data reduces sycophancy in large language models
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
33
69
0
07 Aug 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
27
127
0
02 Aug 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
32
6
0
17 Jul 2023
Leveraging Contextual Counterfactuals Toward Belief Calibration
Leveraging Contextual Counterfactuals Toward Belief Calibration
Qiuyi Zhang
Zhang
Michael S. Lee
Sherol Chen
29
1
0
13 Jul 2023
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning
Hengli Li
Songchun Zhu
Zilong Zheng
11
8
0
15 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
44
305
0
02 Jun 2023
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via
  Pessimism
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism
Zihao Li
Zhuoran Yang
Mengdi Wang
OffRL
37
55
0
29 May 2023
On the Tool Manipulation Capability of Open-source Large Language Models
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
35
69
0
25 May 2023
Model evaluation for extreme risks
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
46
153
0
24 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
22
2
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
38
54
0
24 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
  Models
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
51
112
0
22 May 2023
On the Limitations of Simulating Active Learning
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
31
11
0
21 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
128
1,152
0
17 May 2023
Leveraging Large Language Models in Conversational Recommender Systems
Leveraging Large Language Models in Conversational Recommender Systems
Luke Friedman
Sameer Ahuja
David Allen
Zhenning Tan
Hakim Sidahmed
...
Ajay Patel
Harsh Lara
Brian Chu
Zexiang Chen
Manoj Kumar Tiwari
32
104
0
13 May 2023
"Alexa doesn't have that many feelings": Children's understanding of AI
  through interactions with smart speakers in their homes
"Alexa doesn't have that many feelings": Children's understanding of AI through interactions with smart speakers in their homes
Valentina Andries
J. Robertson
28
17
0
09 May 2023
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to
  Guardrail Models for Virtual Assistants
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
35
3
0
27 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
18
408
0
13 Apr 2023
Evaluating Large Language Models on a Highly-specialized Topic,
  Radiation Oncology Physics
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
J. Holmes
Zheng Liu
Lian-Cheng Zhang
Yuzhen Ding
Terence T. Sio
...
Jonathan B. Ashman
Xiang Li
Tianming Liu
Jiajian Shen
Wei Liu
LM&MA
AI4CE
ELM
30
120
0
01 Apr 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language
  Model Society
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDa
ALM
50
412
0
31 Mar 2023
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
91
789
0
30 Mar 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
43
342
0
30 Mar 2023
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on
  Consistency with Human Preferences
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Yunjie Ji
Yan Gong
Yiping Peng
Chao Ni
Peiyan Sun
Dongyu Pan
Baochang Ma
Xiangang Li
ELM
ALM
AI4MH
30
37
0
14 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
38
508
0
07 Mar 2023
Machine Love
Machine Love
Joel Lehman
28
5
0
18 Feb 2023
Tuning computer vision models with task rewards
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
27
40
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
159
0
15 Feb 2023
Towards Agile Text Classifiers for Everyone
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
52
13
0
13 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
  Security Attacks
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
30
237
0
11 Feb 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
42
183
0
26 Jan 2023
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
52
1
0
24 Dec 2022
Attributed Question Answering: Evaluation and Modeling for Attributed
  Large Language Models
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Bernd Bohnet
Vinh Q. Tran
Pat Verga
Roee Aharoni
D. Andor
...
Michael Collins
Dipanjan Das
Donald Metzler
Slav Petrov
Kellie Webster
43
59
0
15 Dec 2022
Manifestations of Xenophobia in AI Systems
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
106
1,487
0
15 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
72
439
0
08 Dec 2022
Talking About Large Language Models
Talking About Large Language Models
Murray Shanahan
AI4CE
23
242
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
49
215
0
28 Nov 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
481
0
19 Oct 2022
Revision Transformers: Instructing Language Models to Change their
  Values
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
33
6
0
19 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLM
LRM
55
269
0
29 Sep 2022
Previous
123
Next