ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.02439
  4. Cited By
Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Contextualizing Hate Speech Classifiers with Post-hoc Explanation

5 May 2020
Brendan Kennedy
Xisen Jin
Aida Mostafazadeh Davani
Morteza Dehghani
Xiang Ren
ArXivPDFHTML

Papers citing "Contextualizing Hate Speech Classifiers with Post-hoc Explanation"

39 / 39 papers shown
Title
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Vasiliki Papanikou
Danae Pla Karidi
E. Pitoura
Emmanouil Panagiotou
Eirini Ntoutsi
33
0
0
01 May 2025
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Jiaxin Song
Xinyu Wang
Yihao Wang
Yifan Tang
Ru Zhang
Jianyi Liu
Gongshen Liu
AAML
43
0
0
03 Jan 2025
Interacting Large Language Model Agents. Interpretable Models and Social
  Learning
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
37
0
0
02 Nov 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
60
5
0
11 Jul 2024
Cross-lingual Offensive Language Detection: A Systematic Review of
  Datasets, Transfer Approaches and Challenges
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
31
3
0
17 Jan 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
  Hate Speech Detection Case Study
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
45
0
0
16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
30
11
0
16 Nov 2023
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens
  Contributing to Explicit Hate in English by Span Detection
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection
Sarah Masud
Mohammad Aflah Khan
Md. Shad Akhtar
Tanmoy Chakraborty
29
3
0
16 Nov 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
27
125
0
02 Aug 2023
Should We Attend More or Less? Modulating Attention for Fairness
Should We Attend More or Less? Modulating Attention for Fairness
A. Zayed
Gonçalo Mordido
Samira Shabanian
Sarath Chandar
37
10
0
22 May 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
21
117
0
07 Mar 2023
Explaining text classifiers through progressive neighborhood
  approximation with realistic samples
Explaining text classifiers through progressive neighborhood approximation with realistic samples
Yi Cai
Arthur Zimek
Eirini Ntoutsi
Gerhard Wunder
AI4TS
22
0
0
11 Feb 2023
Nationality Bias in Text Generation
Nationality Bias in Text Generation
Pranav Narayanan Venkit
Sanjana Gautam
Ruchi Panchanadikar
Ting-Hao 'Kenneth' Huang
Shomir Wilson
33
51
0
05 Feb 2023
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging
  of NLP Models
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models
Dong-Ho Lee
Akshen Kadakia
Brihi Joshi
Aaron Chan
Ziyi Liu
...
Takashi Shibuya
Ryosuke Mitani
Toshiyuki Sekiya
Jay Pujara
Xiang Ren
LRM
40
9
0
30 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between
  Languages for Zero-Shot Transfer of Hate Speech Detection Models
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
29
10
0
24 Oct 2022
TCAB: A Large-Scale Text Classification Attack Benchmark
TCAB: A Large-Scale Text Classification Attack Benchmark
Kalyani Asthana
Zhouhang Xie
Wencong You
Adam Noack
Jonathan Brophy
Sameer Singh
Daniel Lowd
39
3
0
21 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into
  Under-Resourced Languages
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
29
10
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
  Classifier Uses Sentiment Information
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
40
3
0
19 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese
Bjorn Ross
Mirella Lapata
36
10
0
06 Oct 2022
Domain Classification-based Source-specific Term Penalization for Domain
  Adaptation in Hate-speech Detection
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
19
0
0
18 Sep 2022
Challenges in Applying Explainability Methods to Improve the Fairness of
  NLP Models
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
21
36
0
08 Jun 2022
KOLD: Korean Offensive Language Dataset
KOLD: Korean Offensive Language Dataset
Young-kuk Jeong
Juhyun Oh
Jaimeen Ahn
Jongwon Lee
Jihyung Mon
Sungjoon Park
Alice H. Oh
57
25
0
23 May 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
  and Hate Speech Detection
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
26
17
0
09 May 2022
Dynamically Refined Regularization for Improving Cross-corpora Hate
  Speech Detection
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
45
5
0
23 Mar 2022
Handling Bias in Toxic Speech Detection: A Survey
Handling Bias in Toxic Speech Detection: A Survey
Tanmay Garg
Sarah Masud
Tharun Suresh
Tanmoy Chakraborty
17
91
0
26 Jan 2022
A Survey on Gender Bias in Natural Language Processing
A Survey on Gender Bias in Natural Language Processing
Karolina Stañczak
Isabelle Augenstein
30
110
0
28 Dec 2021
Character-level HyperNetworks for Hate Speech Detection
Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach
A. Adler
Einat Minkov
18
12
0
11 Nov 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
37
5
0
16 Oct 2021
Detecting Community Sensitive Norm Violations in Online Conversations
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park
Julia Mendelsohn
Karthik Radhakrishnan
Kinjal Jain
Tushar Kanakagiri
David Jurgens
Yulia Tsvetkov
38
21
0
09 Oct 2021
Mitigating Racial Biases in Toxic Language Detection with an
  Equity-Based Ensemble Framework
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
42
35
0
27 Sep 2021
Fairness-aware Class Imbalanced Learning
Fairness-aware Class Imbalanced Learning
Shivashankar Subramanian
Afshin Rahimi
Timothy Baldwin
Trevor Cohn
Lea Frermann
FaML
109
28
0
21 Sep 2021
SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification
  by Utilising the Notion of "Subjectivity" and "Identity Terms"
SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"
Zhixue Zhao
Ziqi Zhang
F. Hopfgartner
16
5
0
06 Sep 2021
Enjoy the Salience: Towards Better Transformer-based Faithful
  Explanations with Word Salience
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
G. Chrysostomou
Nikolaos Aletras
32
16
0
31 Aug 2021
On Measures of Biases and Harms in NLP
On Measures of Biases and Harms in NLP
Sunipa Dev
Emily Sheng
Jieyu Zhao
Aubrie Amstutz
Jiao Sun
...
M. Sanseverino
Jiin Kim
Akihiro Nishi
Nanyun Peng
Kai-Wei Chang
31
80
0
07 Aug 2021
Improving Counterfactual Generation for Fair Hate Speech Detection
Improving Counterfactual Generation for Fair Hate Speech Detection
Aida Mostafazadeh Davani
Ali Omrani
Brendan Kennedy
M. Atari
Xiang Ren
Morteza Dehghani
30
9
0
03 Aug 2021
A Survey of Race, Racism, and Anti-Racism in NLP
A Survey of Race, Racism, and Anti-Racism in NLP
Anjalie Field
Su Lin Blodgett
Zeerak Talat
Yulia Tsvetkov
36
122
0
21 Jun 2021
Towards generalisable hate speech detection: a review on obstacles and
  solutions
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin
A. Zubiaga
117
164
0
17 Feb 2021
Learning from the Worst: Dynamically Generated Datasets to Improve
  Online Hate Detection
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
28
242
0
31 Dec 2020
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
31
259
0
31 Dec 2020
1