ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18585
  4. Cited By
Exploiting Explainability to Design Adversarial Attacks and Evaluate
  Attack Resilience in Hate-Speech Detection Models

Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models

29 May 2023
Pranath Reddy Kumbam
Sohaib Uddin Syed
Prashanth Thamminedi
S. Harish
Ian Perera
Bonnie J. Dorr
    AAML
ArXivPDFHTML

Papers citing "Exploiting Explainability to Design Adversarial Attacks and Evaluate Attack Resilience in Hate-Speech Detection Models"

4 / 4 papers shown
Title
Improving Adversarial Data Collection by Supporting Annotators: Lessons
  from GAHD, a German Hate Speech Dataset
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
Janis Goldzycher
Paul Röttger
Gerold Schneider
AAML
29
9
0
28 Mar 2024
BERT is Robust! A Case Against Synonym-Based Adversarial Examples in
  Text Classification
BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification
J. Hauser
Zhao Meng
Damian Pascual
Roger Wattenhofer
OOD
SILM
AAML
191
13
0
15 Sep 2021
Towards A Rigorous Science of Interpretable Machine Learning
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
257
3,690
0
28 Feb 2017
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
264
13,368
0
25 Aug 2014
1