Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.14375
Cited By
Improving alignment of dialogue agents via targeted human judgements
28 September 2022
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
Timo Ewalds
Maribeth Rauh
Laura Weidinger
Martin Chadwick
Phoebe Thacker
Lucy Campbell-Gillingham
J. Uesato
Po-Sen Huang
Ramona Comanescu
Fan Yang
A. See
Sumanth Dathathri
Rory Greig
Charlie Chen
Doug Fritz
Jaume Sanchez Elias
Richard Green
Sovna Mokrá
Nicholas Fernando
Boxi Wu
Rachel Foley
Susannah Young
Iason Gabriel
William S. Isaac
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving alignment of dialogue agents via targeted human judgements"
50 / 112 papers shown
Title
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
Xiaomin Li
Mingye Gao
Yuexing Hao
Taoran Li
Guangya Wan
Zihan Wang
Yijun Wang
LM&MA
ELM
AI4MH
12
0
0
16 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
36
0
0
12 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
Aligning Language Models for Icelandic Legal Text Summarization
Þórir Hrafn Harðarson
Hrafn Loftsson
Stefán Ólafsson
AILaw
AI4TS
ELM
82
0
0
25 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Liu
Y. Wu
OffRL
LRM
46
13
0
03 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
32
1
0
03 Apr 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
83
4
0
19 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
51
0
0
24 Feb 2025
A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies
Alicia DeVrio
Myra Cheng
Lisa Egede
Alexandra Olteanu
Su Lin Blodgett
63
2
0
14 Feb 2025
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
51
2
0
08 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
46
1
0
08 Jan 2025
LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots
Dongge Han
Trevor A. McInroe
Adam Jelley
Stefano V. Albrecht
Peter Bell
Amos Storkey
61
11
0
31 Dec 2024
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
173
1
0
06 Nov 2024
On the Loss of Context-awareness in General Instruction Fine-tuning
Yihan Wang
Andrew Bai
Nanyun Peng
Cho-Jui Hsieh
121
1
0
05 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
68
0
0
15 Oct 2024
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback
Fatemeh Pesaran Zadeh
Juyeon Kim
Jin-Hwa Kim
Gunhee Kim
ALM
51
2
0
05 Oct 2024
From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues
Yu Li
Devamanyu Hazarika
Di Jin
Julia Hirschberg
Yang Liu
28
0
0
04 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
37
10
0
04 Oct 2024
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
1
0
02 Oct 2024
Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner
Chenyou Fan
Chenjia Bai
Zhao Shan
Haoran He
Yang Zhang
Zhen Wang
33
3
0
30 Sep 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
56
6
0
05 Aug 2024
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias
Waqar Hussain
43
0
0
16 Jul 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&Ro
KELM
53
6
0
11 Jul 2024
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim
Yiming Yang
50
7
0
26 Jun 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLM
LRM
39
1
0
25 Jun 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
S. Kadhe
Farhan Ahmed
Dennis Wei
Nathalie Baracaldo
Inkit Padhi
MoMe
MU
28
7
0
17 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
63
2
0
12 Jun 2024
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
23
0
29 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
49
34
0
27 May 2024
InstructPatentGPT: Training patent language models to follow instructions with human feedback
Jieh-Sheng Lee
ALM
49
6
0
25 May 2024
A social path to human-like artificial intelligence
Edgar A. Duénez-Guzmán
Suzanne Sadedin
Jane X. Wang
Kevin R. McKee
Joel Z Leibo
GNN
31
28
0
22 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
47
2
0
10 May 2024
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
33
4
0
26 Apr 2024
Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search And Fine-Tuning
Yilin Gao
Arava Sai Kumar
Yancheng Li
James W. Snyder
AI4MH
40
2
0
16 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
46
8
0
13 Apr 2024
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
71
5
0
11 Apr 2024
Apprentices to Research Assistants: Advancing Research with Large Language Models
M. Namvarpour
A. Razi
40
3
0
09 Apr 2024
Binary Classifier Optimization for Large Language Model Alignment
Seungjae Jung
Gunsoo Han
D. W. Nam
Kyoung-Woon On
42
21
0
06 Apr 2024
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
M. Russinovich
Ahmed Salem
Ronen Eldan
56
77
0
02 Apr 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
52
17
0
12 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
150
502
0
07 Mar 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
21
14
0
22 Dec 2023
Ophtha-LLaMA2: A Large Language Model for Ophthalmology
Huan Zhao
Qiang Ling
Yi Pan
Tianyang Zhong
Jin-Yu Hu
...
Zihao Wu
Zheng Liu
Xi Jiang
Tianming Liu
Yi Shao
LM&MA
36
10
0
08 Dec 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
67
21
0
28 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
39
9
0
09 Nov 2023
Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings
Elena Sergeeva
Anastasia Sergeeva
Huiyun Tang
Kerstin Bongard-Blanchy
Peter Szolovits
27
1
0
22 Oct 2023
1
2
3
Next