Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17104
Cited By
Automated Adversarial Discovery for Safety Classifiers
24 June 2024
Yash Kumar Lal
Preethi Lahoti
Aradhana Sinha
Yao Qin
Ananth Balashankar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automated Adversarial Discovery for Safety Classifiers"
9 / 9 papers shown
Title
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
45
362
0
17 Mar 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
46
627
0
07 Feb 2022
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
130
401
0
07 Apr 2021
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu
Marco Tulio Ribeiro
Jeffrey Heer
Daniel S. Weld
79
246
0
01 Jan 2021
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
129
1,089
0
08 May 2020
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Jason W. Wei
Kai Zou
78
1,931
0
31 Jan 2019
Counterfactual Fairness in Text Classification through Robustness
Sahaj Garg
Vincent Perot
Nicole Limtiaco
Ankur Taly
Ed H. Chi
Alex Beutel
67
258
0
27 Sep 2018
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Ji Gao
Jack Lanchantin
M. Soffa
Yanjun Qi
AAML
112
716
0
13 Jan 2018
Synthetic and Natural Noise Both Break Neural Machine Translation
Yonatan Belinkov
Yonatan Bisk
102
737
0
06 Nov 2017
1