ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.17104
  4. Cited By
Automated Adversarial Discovery for Safety Classifiers

Automated Adversarial Discovery for Safety Classifiers

24 June 2024
Yash Kumar Lal
Preethi Lahoti
Aradhana Sinha
Yao Qin
Ananth Balashankar
ArXivPDFHTML

Papers citing "Automated Adversarial Discovery for Safety Classifiers"

9 / 9 papers shown
Title
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
45
362
0
17 Mar 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
46
627
0
07 Feb 2022
Dynabench: Rethinking Benchmarking in NLP
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
130
401
0
07 Apr 2021
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and
  Improving Models
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu
Marco Tulio Ribeiro
Jeffrey Heer
Daniel S. Weld
79
246
0
01 Jan 2021
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
129
1,089
0
08 May 2020
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text
  Classification Tasks
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
Jason W. Wei
Kai Zou
78
1,931
0
31 Jan 2019
Counterfactual Fairness in Text Classification through Robustness
Counterfactual Fairness in Text Classification through Robustness
Sahaj Garg
Vincent Perot
Nicole Limtiaco
Ankur Taly
Ed H. Chi
Alex Beutel
67
258
0
27 Sep 2018
Black-box Generation of Adversarial Text Sequences to Evade Deep
  Learning Classifiers
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Ji Gao
Jack Lanchantin
M. Soffa
Yanjun Qi
AAML
112
716
0
13 Jan 2018
Synthetic and Natural Noise Both Break Neural Machine Translation
Synthetic and Natural Noise Both Break Neural Machine Translation
Yonatan Belinkov
Yonatan Bisk
102
737
0
06 Nov 2017
1