Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.05573
Cited By
v1
v2 (latest)
TaeBench: Improving Quality of Toxic Adversarial Examples
8 October 2024
Xuan Zhu
Dmitriy Bespalov
Liwen You
Ninad Kulkarni
Yanjun Qi
AAML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TaeBench: Improving Quality of Toxic Adversarial Examples"
18 / 18 papers shown
Title
Towards Building a Robust Toxicity Predictor
Dmitriy Bespalov
Sourav S. Bhabesh
Yi Xiang
Liutong Zhou
Yanjun Qi
AAML
153
16
0
09 Apr 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
174
466
0
07 Dec 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
111
152
0
16 Oct 2023
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Irina Proskurina
Irina Piontkovskaya
Ekaterina Artemova
109
4
0
04 Apr 2023
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Xingwei He
Zheng-Wen Lin
Yeyun Gong
Alex Jin
Hang Zhang
Chen Lin
Jian Jiao
Siu-Ming Yiu
Nan Duan
Weizhu Chen
119
201
0
29 Mar 2023
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
96
392
0
17 Mar 2022
On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks
Salijona Dyrmishi
Salah Ghamizi
Thibault Simonetto
Yves Le Traon
Maxime Cordy
AAML
88
20
0
07 Feb 2022
Towards Improving Adversarial Training of NLP Models
Jin Yong Yoo
Yanjun Qi
AAML
206
127
0
01 Sep 2021
Reevaluating Adversarial Examples in Natural Language
John X. Morris
Eli Lifland
Jack Lanchantin
Yangfeng Ji
Yanjun Qi
SILM
AAML
186
114
0
25 Apr 2020
BAE: BERT-based Adversarial Examples for Text Classification
Siddhant Garg
Goutham Ramakrishnan
AAML
SILM
219
557
0
04 Apr 2020
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILM
AAML
225
750
0
13 Dec 2018
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
423
934
0
21 Apr 2018
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Minhao Cheng
Jinfeng Yi
Pin-Yu Chen
Huan Zhang
Cho-Jui Hsieh
SILM
AAML
118
245
0
03 Mar 2018
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Ji Gao
Jack Lanchantin
M. Soffa
Yanjun Qi
AAML
170
727
0
13 Jan 2018
Adversarial Examples for Evaluating Reading Comprehension Systems
Robin Jia
Percy Liang
AAML
ELM
246
1,610
0
23 Jul 2017
Automated Hate Speech Detection and the Problem of Offensive Language
Thomas Davidson
Dana Warmsley
M. Macy
Ingmar Weber
86
2,708
0
11 Mar 2017
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Hossein Hosseini
Sreeram Kannan
Baosen Zhang
Radha Poovendran
AAML
90
328
0
27 Feb 2017
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAML
GAN
474
19,189
0
20 Dec 2014
1