Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.11462
Cited By
v1
v2 (latest)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"
50 / 814 papers shown
Title
Extrapolative Controlled Sequence Generation via Iterative Refinement
Vishakh Padmakumar
Richard Yuanzhe Pang
He He
Ankur P. Parikh
82
10
0
08 Mar 2023
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones
Anca Dragan
Aditi Raghunathan
Jacob Steinhardt
119
172
0
08 Mar 2023
Data Portraits: Recording Foundation Model Training Data
Marc Marone
Benjamin Van Durme
209
30
0
06 Mar 2023
Interactive Text Generation
Felix Faltings
Michel Galley
Baolin Peng
Kianté Brantley
Weixin Cai
Yizhe Zhang
Jianfeng Gao
Bill Dolan
78
0
0
02 Mar 2023
Systematic Rectification of Language Models via Dead-end Analysis
Mengyao Cao
Mehdi Fatemi
Jackie C.K. Cheung
Samira Shabanian
KELM
73
16
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.7K
13,554
0
27 Feb 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
189
504
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
95
17
0
18 Feb 2023
Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints
Albert Lu
Hongxin Zhang
Yanzhe Zhang
Xuezhi Wang
Diyi Yang
LRM
85
32
0
17 Feb 2023
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
102
232
0
16 Feb 2023
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
159
216
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
109
76
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
92
171
0
15 Feb 2023
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
LM&MA
59
21
0
14 Feb 2023
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
Rafal Kocielnik
Shrimai Prabhumoye
Vivian Zhang
Roy Jiang
R. Alvarez
Anima Anandkumar
75
8
0
14 Feb 2023
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
Melissa Roemmele
Kyle Shaffer
Katrina Olsen
Yiyi Wang
Steve DeNeefe
47
1
0
13 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
105
253
0
11 Feb 2023
Data Selection for Language Models via Importance Resampling
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Percy Liang
131
196
0
06 Feb 2023
Conversation Regression Testing: A Design Technique for Prototyping Generalizable Prompt Strategies for Pre-trained Language Models
J.D. Zamfirescu-Pereira
Bjoern Hartmann
Qian Yang
45
2
0
06 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
143
123
0
31 Jan 2023
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation
Xiang Lin
Prathyusha Jwalapuram
Shafiq Joty
DiffM
60
0
0
31 Jan 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
105
107
0
30 Jan 2023
Theme-driven Keyphrase Extraction to Analyze Social Media Discourse
William Romano
Omar Sharif
Madhusudan Basak
Joseph Gatto
S. Preum
65
6
0
27 Jan 2023
Language Model Detoxification in Dialogue with Contextualized Stance Control
Jingu Qian
Xifeng Yan
54
1
0
25 Jan 2023
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
Shuaichen Chang
Jun Wang
Mingwen Dong
Lin Pan
Henghui Zhu
...
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick Ng
Bing Xiang
OOD
101
35
0
21 Jan 2023
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Jing Wei
Sungdong Kim
Hyunhoon Jung
Young-Ho Kim
103
89
0
14 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
189
36
0
01 Jan 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
123
23
0
30 Dec 2022
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer
Xi Lin
Ramakanth Pasunuru
Todor Mihaylov
Daniel Simig
...
Jeff Wang
Christopher Dewan
Asli Celikyilmaz
Luke Zettlemoyer
Veselin Stoyanov
ALM
200
268
0
22 Dec 2022
Critic-Guided Decoding for Controlled Text Generation
Minbeom Kim
Hwanhee Lee
Kang Min Yoo
Joonsuk Park
Hwaran Lee
Kyomin Jung
115
36
0
21 Dec 2022
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Skyler Hallinan
Alisa Liu
Yejin Choi
Maarten Sap
60
40
0
20 Dec 2022
Trustworthy Social Bias Measurement
Rishi Bommasani
Percy Liang
76
11
0
20 Dec 2022
Evaluating Psychological Safety of Large Language Models
Xingxuan Li
Yutong Li
Linlin Liu
Shafiq Joty
Lidong Bing
LM&MA
93
24
0
20 Dec 2022
Controllable Text Generation with Language Constraints
Howard Chen
Huihan Li
Danqi Chen
Karthik Narasimhan
67
16
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
112
102
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
102
407
0
19 Dec 2022
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula
Jena D. Hwang
Doug Downey
Ronan Le Bras
Ximing Lu
Lianhui Qin
Keisuke Sakaguchi
Swabha Swayamdipta
Peter West
Yejin Choi
103
34
0
19 Dec 2022
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation
Yuxi Feng
Xiaoyuan Yi
Xiting Wang
L. Lakshmanan
Xing Xie
DiffM
103
5
0
16 Dec 2022
Teaching Small Language Models to Reason
Lucie Charlotte Magister
Jonathan Mallinson
Jakub Adamek
Eric Malmi
Aliaksei Severyn
LRM
AI4CE
ReLM
239
267
0
16 Dec 2022
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
164
200
0
15 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
217
523
0
08 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Minlie Huang
SILM
148
9
0
04 Dec 2022
Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks
Zae Myung Kim
Wanyu Du
Vipul Raheja
Dhruv Kumar
Dongyeop Kang
97
18
0
02 Dec 2022
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning
Xiao Yu
Qingyang Wu
Kun Qian
Zhou Yu
OffRL
70
12
0
30 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
91
4
0
27 Nov 2022
Best-
k
k
k
Search Algorithm for Neural Text Generation
Jiacheng Xu
Caiming Xiong
Silvio Savarese
Yingbo Zhou
93
6
0
22 Nov 2022
Validating Large Language Models with ReLM
Michael Kuchnik
Virginia Smith
George Amvrosiadis
126
31
0
21 Nov 2022
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Rafal Kocielnik
Sara Kangaslahti
Shrimai Prabhumoye
M. Hari
R. Alvarez
Anima Anandkumar
59
8
0
21 Nov 2022
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
106
452
0
17 Nov 2022
Prompting PaLM for Translation: Assessing Strategies and Performance
David Vilar
Markus Freitag
Colin Cherry
Jiaming Luo
Viresh Ratnakar
George F. Foster
LRM
114
167
0
16 Nov 2022
Previous
1
2
3
...
12
13
14
15
16
17
Next