Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.10289
Cited By
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
18 December 2020
Binny Mathew
Punyajoy Saha
Seid Muhie Yimam
Chris Biemann
Pawan Goyal
Animesh Mukherjee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection"
50 / 280 papers shown
Title
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Mahdi Dhaini
Ege Erdogan
Nils Feldhus
Gjergji Kasneci
49
0
0
02 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
88
0
0
30 Apr 2025
Towards a comprehensive taxonomy of online abusive language informed by machine leaning
Samaneh Hosseini Moghaddam
Kelly Lyons
Cheryl Regehr
Vivek Goel
Kaitlyn Regehr
30
0
0
24 Apr 2025
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English
Julian Bäumler
Louis Blöcher
Lars-Joel Frey
Xian Chen
Markus Bayer
Christian A. Reuter
AILaw
46
0
0
11 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey
Ahsan Bilal
David Ebert
Beiyu Lin
72
1
0
31 Mar 2025
Automating Violence Detection and Categorization from Ancient Texts
Alhassan Abdelhalim
Michaela Regneri
59
0
0
11 Mar 2025
Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding
Maria Mihaela Trusca
Liesbeth Allein
52
0
0
11 Mar 2025
LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation
Junyeong Park
Seogyeong Jeong
Shri Kiran Srinivasan
Yohan Lee
Alice H. Oh
57
0
0
10 Mar 2025
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations
Xingwei Tan
Chen Lyu
Hafiz Muhammad Umer
Sahrish Khan
Mahathi Parvatham
Lois Arthurs
Simon Cullen
Shelley Wilson
Arshad Jhumka
Gabriele Pergola
49
0
0
09 Mar 2025
Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing
Neemesh Yadav
Jiarui Liu
Francesco Ortu
Roya Ensafi
Zhijing Jin
Rada Mihalcea
36
0
0
07 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
David Hartmann
Amin Oueslati
Dimitri Staufer
Lena Pohlmann
Simon Munzert
Hendrik Heuer
55
0
0
03 Mar 2025
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Xinru Wang
Mengjie Yu
Hannah Nguyen
Michael Iuzzolino
Tianyi Wang
...
Ting Zhang
Naveen Sendhilnathan
Hrvoje Benko
Haijun Xia
Tanya R. Jonker
53
0
0
26 Feb 2025
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models
Chenlu Guo
Nuo Xu
Yi-Ju Chang
Yuan Wu
AI4MH
LM&MA
57
1
0
24 Feb 2025
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
Junyu Lu
Kai Ma
Kaichun Wang
Kelaiti Xiao
Roy Ka-Wei Lee
Bo Xu
Liang Yang
Hongfei Lin
51
0
0
10 Feb 2025
SCCD: A Session-based Dataset for Chinese Cyberbullying Detection
Qingpo Yang
Yakai Chen
Zihui Xu
Yu-ming Shang
Sanchuan Guo
Xi Zhang
44
0
0
28 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation
Duc Hau Nguyen
Cyrielle Mallart
Guillaume Gravier
Pascale Sébillot
60
0
0
22 Jan 2025
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs
Rohitash Chandra
Guoxiang Ren
G. Houseman
51
0
0
20 Jan 2025
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Jiaxin Song
Xinyu Wang
Yihao Wang
Yifan Tang
Ru Zhang
Jianyi Liu
Gongshen Liu
AAML
45
0
0
03 Jan 2025
SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs
Leon Fröhling
Pietro Bernardelle
Gianluca Demartini
ALM
79
0
0
21 Dec 2024
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
Yuxi Sun
Wei Gao
Jing Ma
Hongzhan Lin
Ziyang Luo
Wenxuan Zhang
ELM
82
0
0
17 Dec 2024
Multilingual and Explainable Text Detoxification with Parallel Corpora
Daryna Dementieva
N. Babakov
Amit Ronen
A. Ayele
Naquee Rizwan
...
Elisei Stakovskii
Eran Kaufman
Ashraf Elnagar
Animesh Mukherjee
Alexander Panchenko
81
1
0
16 Dec 2024
Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs
Mugdha Pandya
Mali Jin
Kalina Bontcheva
Diana Maynard
81
0
0
05 Dec 2024
A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages
Susmita Das
Arpita Dutta
Kingshuk Roy
Abir Mondal
Arnab Mukhopadhyay
76
0
0
28 Nov 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Manuel Tonneau
Diyi Liu
Niyati Malhotra
Scott A. Hale
Samuel Fraiberger
Victor Orozco-Olvera
Paul Röttger
78
0
0
23 Nov 2024
Can Highlighting Help GitHub Maintainers Track Security Fixes?
Xueqing Liu
Yuchen Xiong
Qiushi Liu
Jiangrui Zheng
72
0
0
18 Nov 2024
Pruning Literals for Highly Efficient Explainability at Word Level
Rohan Kumar Yadav
Bimal Bhattarai
Abhik Jana
Lei Jiao
Seid Muhie Yimam
32
0
0
07 Nov 2024
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
Minh Duc Bui
K. Wense
Anne Lauscher
VLM
34
1
0
06 Nov 2024
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers
Lam Nguyen Tung
Steven Cho
Xiaoning Du
Neelofar Neelofar
Valerio Terragni
Stefano Ruberto
Aldeida Aleti
171
2
0
30 Oct 2024
ProvocationProbe: Instigating Hate Speech Dataset from Twitter
Abhay Kumar
Vigneshwaran Shankaran
Rajesh Sharma
13
0
0
25 Oct 2024
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
Zhiting Fan
Ruizhe Chen
Tianxiang Hu
Zuozhu Liu
26
7
0
25 Oct 2024
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
48
1
0
21 Oct 2024
BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla
Fabiha Haider
Fariha Tanjim Shifat
Md Farhan Ishmam
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Fahim
Md Farhad Alam
20
1
0
17 Oct 2024
Disentangling Hate Across Target Identities
Yiping Jin
Leo Wanner
Aneesh Moideen Koya
25
0
0
14 Oct 2024
A Hate Speech Moderated Chat Application: Use Case for GDPR and DSA Compliance
Jan Fillies
Theodoros Mitsikas
Ralph Schäfermeier
Adrian Paschke
13
2
0
10 Oct 2024
Unveiling Transformer Perception by Exploring Input Manifolds
A. Benfenati
Alfio Ferrara
A. Marta
Davide Riva
Elisabetta Rocchetti
18
0
0
08 Oct 2024
Hate Personified: Investigating the role of LLMs in content moderation
Sarah Masud
Sahajpreet Singh
Viktor Hangya
Alexander Fraser
Tanmoy Chakraborty
30
7
0
03 Oct 2024
CrowdCounter: A benchmark type-specific multi-target counterspeech dataset
Punyajoy Saha
Abhilash Datta
Abhik Jana
Animesh Mukherjee
28
0
0
02 Oct 2024
PclGPT: A Large Language Model for Patronizing and Condescending Language Detection
Hongbo Wang
Mingda Li
Junyu Lu
Hebin Xia
Liang Yang
Bo Xu
Ruizhu Liu
Hongfei Lin
32
0
0
01 Oct 2024
Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations
Supriya Manna
Niladri Sett
AAML
29
2
0
26 Sep 2024
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Yungi Kim
Hyunsoo Ha
Sukyung Lee
Jihoo Kim
Seonghoon Yang
Chanjun Park
41
0
0
15 Sep 2024
Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Yifan Liu
Yike Li
Dong Wang
36
0
0
27 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
Swabha Swayamdipta
42
3
0
26 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection
Jinmyeong An
Wonjun Lee
Yejin Jeon
Jungseul Ok
Yunsu Kim
Gary Geunbae Lee
30
2
0
12 Aug 2024
Machine Unlearning in Generative AI: A Survey
Zheyuan Liu
Guangyao Dou
Zhaoxuan Tan
Yijun Tian
Meng Jiang
MU
31
14
0
30 Jul 2024
Towards Generalized Offensive Language Identification
A. Dmonte
Tejas Arya
Tharindu Ranasinghe
Marcos Zampieri
52
3
0
26 Jul 2024
Explanation Regularisation through the Lens of Attributions
Pedro Ferreira
Wilker Aziz
Ivan Titov
46
1
0
23 Jul 2024
POLygraph: Polish Fake News Dataset
Daniel Dzienisiewicz
Filip Graliñski
Piotr Jabłoński
Marek Kubis
Paweł Skórzewski
Piotr Wierzchoñ
28
0
0
01 Jul 2024
Free-text Rationale Generation under Readability Level Control
Yi-Sheng Hsu
Nils Feldhus
Sherzod Hakimov
46
0
0
01 Jul 2024
Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management
Seid Muhie Yimam
Daryna Dementieva
Tim Fischer
Daniil Moskovskiy
Naquee Rizwan
...
Sarthak Roy
Martin Semmann
Alexander Panchenko
Chris Biemann
Animesh Mukherjee
46
0
0
27 Jun 2024
Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services
David Hartmann
Amin Oueslati
Dimitri Staufer
MLAU
35
1
0
20 Jun 2024
1
2
3
4
5
6
Next