Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.02439
Cited By
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
5 May 2020
Brendan Kennedy
Xisen Jin
Aida Mostafazadeh Davani
Morteza Dehghani
Xiang Ren
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Contextualizing Hate Speech Classifiers with Post-hoc Explanation"
39 / 39 papers shown
Title
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Vasiliki Papanikou
Danae Pla Karidi
E. Pitoura
Emmanouil Panagiotou
Eirini Ntoutsi
33
0
0
01 May 2025
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Jiaxin Song
Xinyu Wang
Yihao Wang
Yifan Tang
Ru Zhang
Jianyi Liu
Gongshen Liu
AAML
43
0
0
03 Jan 2025
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
37
0
0
02 Nov 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
60
5
0
11 Jul 2024
Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges
Aiqi Jiang
A. Zubiaga
AAML
31
3
0
17 Jan 2024
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
45
0
0
16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
30
11
0
16 Nov 2023
Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection
Sarah Masud
Mohammad Aflah Khan
Md. Shad Akhtar
Tanmoy Chakraborty
29
3
0
16 Nov 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
27
125
0
02 Aug 2023
Should We Attend More or Less? Modulating Attention for Fairness
A. Zayed
Gonçalo Mordido
Samira Shabanian
Sarath Chandar
37
10
0
22 May 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
21
117
0
07 Mar 2023
Explaining text classifiers through progressive neighborhood approximation with realistic samples
Yi Cai
Arthur Zimek
Eirini Ntoutsi
Gerhard Wunder
AI4TS
22
0
0
11 Feb 2023
Nationality Bias in Text Generation
Pranav Narayanan Venkit
Sanjana Gautam
Ruchi Panchanadikar
Ting-Hao 'Kenneth' Huang
Shomir Wilson
33
51
0
05 Feb 2023
XMD: An End-to-End Framework for Interactive Explanation-Based Debugging of NLP Models
Dong-Ho Lee
Akshen Kadakia
Brihi Joshi
Aaron Chan
Ziyi Liu
...
Takashi Shibuya
Ryosuke Mitani
Toshiyuki Sekiya
Jay Pujara
Xiang Ren
LRM
40
9
0
30 Oct 2022
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models
Syrielle Montariol
Arij Riabi
Djamé Seddah
29
10
0
24 Oct 2022
TCAB: A Large-Scale Text Classification Attack Benchmark
Kalyani Asthana
Zhouhang Xie
Wencong You
Adam Noack
Jonathan Brophy
Sameer Singh
Daniel Lowd
39
3
0
21 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
29
10
0
20 Oct 2022
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
I. Nejadgholi
Esma Balkir
Kathleen C. Fraser
S. Kiritchenko
40
3
0
19 Oct 2022
Explainable Abuse Detection as Intent Classification and Slot Filling
Agostina Calabrese
Bjorn Ross
Mirella Lapata
36
10
0
06 Oct 2022
Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
19
0
0
18 Sep 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
21
36
0
08 Jun 2022
KOLD: Korean Offensive Language Dataset
Young-kuk Jeong
Juhyun Oh
Jaimeen Ahn
Jongwon Lee
Jihyung Mon
Sungjoon Park
Alice H. Oh
57
25
0
23 May 2022
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection
Indira Sen
Mattia Samory
Claudia Wagner
Isabelle Augenstein
26
17
0
09 May 2022
Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
Tulika Bose
Nikolaos Aletras
Irina Illina
Dominique Fohr
45
5
0
23 Mar 2022
Handling Bias in Toxic Speech Detection: A Survey
Tanmay Garg
Sarah Masud
Tharun Suresh
Tanmoy Chakraborty
17
91
0
26 Jan 2022
A Survey on Gender Bias in Natural Language Processing
Karolina Stañczak
Isabelle Augenstein
30
110
0
28 Dec 2021
Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach
A. Adler
Einat Minkov
18
12
0
11 Nov 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
37
5
0
16 Oct 2021
Detecting Community Sensitive Norm Violations in Online Conversations
Chan Young Park
Julia Mendelsohn
Karthik Radhakrishnan
Kinjal Jain
Tushar Kanakagiri
David Jurgens
Yulia Tsvetkov
38
21
0
09 Oct 2021
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
42
35
0
27 Sep 2021
Fairness-aware Class Imbalanced Learning
Shivashankar Subramanian
Afshin Rahimi
Timothy Baldwin
Trevor Cohn
Lea Frermann
FaML
109
28
0
21 Sep 2021
SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"
Zhixue Zhao
Ziqi Zhang
F. Hopfgartner
16
5
0
06 Sep 2021
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
G. Chrysostomou
Nikolaos Aletras
32
16
0
31 Aug 2021
On Measures of Biases and Harms in NLP
Sunipa Dev
Emily Sheng
Jieyu Zhao
Aubrie Amstutz
Jiao Sun
...
M. Sanseverino
Jiin Kim
Akihiro Nishi
Nanyun Peng
Kai-Wei Chang
31
80
0
07 Aug 2021
Improving Counterfactual Generation for Fair Hate Speech Detection
Aida Mostafazadeh Davani
Ali Omrani
Brendan Kennedy
M. Atari
Xiang Ren
Morteza Dehghani
30
9
0
03 Aug 2021
A Survey of Race, Racism, and Anti-Racism in NLP
Anjalie Field
Su Lin Blodgett
Zeerak Talat
Yulia Tsvetkov
36
122
0
21 Jun 2021
Towards generalisable hate speech detection: a review on obstacles and solutions
Wenjie Yin
A. Zubiaga
117
164
0
17 Feb 2021
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
28
242
0
31 Dec 2020
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
31
259
0
31 Dec 2020
1