Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18585
Cited By
Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models
29 May 2023
Pranath Reddy Kumbam
Sohaib Uddin Syed
Prashanth Thamminedi
S. Harish
Ian Perera
Bonnie J. Dorr
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models"
4 / 4 papers shown
Title
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher
Paul Röttger
Gerold Schneider
AAML
29
9
0
28 Mar 2024
BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification
J. Hauser
Zhao Meng
Damian Pascual
Roger Wattenhofer
OOD
SILM
AAML
191
13
0
15 Sep 2021
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
257
3,690
0
28 Feb 2017
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
264
13,368
0
25 Aug 2014
1