Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.01295
Cited By
Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models
1 April 2024
Yi-Lin Tuan
Xilun Chen
Eric Michael Smith
Louis Martin
Soumya Batra
Asli Celikyilmaz
William Yang Wang
Daniel M. Bikel
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models"
26 / 26 papers shown
Title
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen
Dongcheng Zhao
Yiting Dong
Xiang He
Yi Zeng
AAML
74
3
0
03 Oct 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
141
50
0
31 May 2024
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
71
179
0
14 Nov 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
385
3,981
0
29 May 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,247
0
27 Feb 2023
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
66
69
0
19 May 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
249
2,561
0
12 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
489
6,240
0
05 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
874
12,973
0
04 Mar 2022
Local Explanation of Dialogue Response Generation
Yi-Lin Tuan
Connor Pryor
Wenhu Chen
Lise Getoor
Wenjie Wang
61
11
0
11 Jun 2021
Controlling Style in Generated Dialogue
Eric Michael Smith
Diana Gonzalez-Rico
Emily Dinan
Y-Lan Boureau
AI4CE
90
51
0
22 Sep 2020
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
136
970
0
04 Dec 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
249
10,829
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
466
1,734
0
18 Sep 2019
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
571
2,670
0
03 Sep 2019
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
187
3,184
0
22 Apr 2019
Proximal Policy Optimization and its Dynamic Version for Sequence Generation
Yi-Lin Tuan
Jinzhi Zhang
Yujia Li
Hung-yi Lee
45
10
0
24 Aug 2018
Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation
Yi-Lin Tuan
Hung-yi Lee
GAN
65
55
0
16 Aug 2018
Personalizing Dialogue Agents: I have a dog, do you have pets too?
Saizheng Zhang
Emily Dinan
Jack Urbanek
Arthur Szlam
Douwe Kiela
Jason Weston
105
1,459
0
22 Jan 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
499
19,065
0
20 Jul 2017
A causal framework for explaining the predictions of black-box sequence-to-sequence models
David Alvarez-Melis
Tommi Jaakkola
CML
351
204
0
06 Jul 2017
A Persona-Based Neural Conversation Model
Jiwei Li
Michel Galley
Chris Brockett
Georgios P. Spithourakis
Jianfeng Gao
W. Dolan
116
1,036
0
19 Mar 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
FaML
1.2K
16,990
0
16 Feb 2016
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
102
1,615
0
20 Nov 2015
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
437
20,568
0
10 Sep 2014
1