ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.05573
  4. Cited By
TaeBench: Improving Quality of Toxic Adversarial Examples
v1v2 (latest)

TaeBench: Improving Quality of Toxic Adversarial Examples

8 October 2024
Xuan Zhu
Dmitriy Bespalov
Liwen You
Ninad Kulkarni
Yanjun Qi
    AAML
ArXiv (abs)PDFHTML

Papers citing "TaeBench: Improving Quality of Toxic Adversarial Examples"

18 / 18 papers shown
Title
Towards Building a Robust Toxicity Predictor
Towards Building a Robust Toxicity Predictor
Dmitriy Bespalov
Sourav S. Bhabesh
Yi Xiang
Liutong Zhou
Yanjun Qi
AAML
153
16
0
09 Apr 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
174
466
0
07 Dec 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications
  with Programmable Rails
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
111
152
0
16 Oct 2023
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Irina Proskurina
Irina Piontkovskaya
Ekaterina Artemova
109
4
0
04 Apr 2023
AnnoLLM: Making Large Language Models to Be Better Crowdsourced
  Annotators
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators
Xingwei He
Zheng-Wen Lin
Yeyun Gong
Alex Jin
Hang Zhang
Chen Lin
Jian Jiao
Siu-Ming Yiu
Nan Duan
Weizhu Chen
119
201
0
29 Mar 2023
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
96
392
0
17 Mar 2022
On The Empirical Effectiveness of Unrealistic Adversarial Hardening
  Against Realistic Adversarial Attacks
On The Empirical Effectiveness of Unrealistic Adversarial Hardening Against Realistic Adversarial Attacks
Salijona Dyrmishi
Salah Ghamizi
Thibault Simonetto
Yves Le Traon
Maxime Cordy
AAML
88
20
0
07 Feb 2022
Towards Improving Adversarial Training of NLP Models
Towards Improving Adversarial Training of NLP Models
Jin Yong Yoo
Yanjun Qi
AAML
206
127
0
01 Sep 2021
Reevaluating Adversarial Examples in Natural Language
Reevaluating Adversarial Examples in Natural Language
John X. Morris
Eli Lifland
Jack Lanchantin
Yangfeng Ji
Yanjun Qi
SILMAAML
186
114
0
25 Apr 2020
BAE: BERT-based Adversarial Examples for Text Classification
BAE: BERT-based Adversarial Examples for Text Classification
Siddhant Garg
Goutham Ramakrishnan
AAMLSILM
219
557
0
04 Apr 2020
TextBugger: Generating Adversarial Text Against Real-world Applications
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILMAAML
225
750
0
13 Dec 2018
Generating Natural Language Adversarial Examples
Generating Natural Language Adversarial Examples
M. Alzantot
Yash Sharma
Ahmed Elgohary
Bo-Jhang Ho
Mani B. Srivastava
Kai-Wei Chang
AAML
423
934
0
21 Apr 2018
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with
  Adversarial Examples
Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples
Minhao Cheng
Jinfeng Yi
Pin-Yu Chen
Huan Zhang
Cho-Jui Hsieh
SILMAAML
118
245
0
03 Mar 2018
Black-box Generation of Adversarial Text Sequences to Evade Deep
  Learning Classifiers
Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers
Ji Gao
Jack Lanchantin
M. Soffa
Yanjun Qi
AAML
170
727
0
13 Jan 2018
Adversarial Examples for Evaluating Reading Comprehension Systems
Adversarial Examples for Evaluating Reading Comprehension Systems
Robin Jia
Percy Liang
AAMLELM
246
1,610
0
23 Jul 2017
Automated Hate Speech Detection and the Problem of Offensive Language
Automated Hate Speech Detection and the Problem of Offensive Language
Thomas Davidson
Dana Warmsley
M. Macy
Ingmar Weber
86
2,708
0
11 Mar 2017
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Hossein Hosseini
Sreeram Kannan
Baosen Zhang
Radha Poovendran
AAML
90
328
0
27 Feb 2017
Explaining and Harnessing Adversarial Examples
Explaining and Harnessing Adversarial Examples
Ian Goodfellow
Jonathon Shlens
Christian Szegedy
AAMLGAN
462
19,189
0
20 Dec 2014
1