HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

18 December 2020

Papers citing "HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection"

50 / 280 papers shown

Title
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods Mahdi Dhaini Ege Erdogan Nils Feldhus Gjergji Kasneci 49 0 0 02 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models Minh-Hao Van Xintao Wu VLM 88 0 0 30 Apr 2025
Towards a comprehensive taxonomy of online abusive language informed by machine leaning Samaneh Hosseini Moghaddam Kelly Lyons Cheryl Regehr Vivek Goel Kaitlyn Regehr 30 0 0 24 Apr 2025
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English Julian Bäumler Louis Blöcher Lars-Joel Frey Xian Chen Markus Bayer Christian A. Reuter AILaw 46 0 0 11 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey Ahsan Bilal David Ebert Beiyu Lin 72 1 0 31 Mar 2025
Automating Violence Detection and Categorization from Ancient Texts Alhassan Abdelhalim Michaela Regneri 59 0 0 11 Mar 2025
Mimicking How Humans Interpret Out-of-Context Sentences Through Controlled Toxicity Decoding Maria Mihaela Trusca Liesbeth Allein 52 0 0 11 Mar 2025
LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation Junyeong Park Seogyeong Jeong Shri Kiran Srinivasan Yohan Lee Alice H. Oh 57 0 0 10 Mar 2025
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations Xingwei Tan Chen Lyu Hafiz Muhammad Umer Sahrish Khan Mahathi Parvatham Lois Arthurs Simon Cullen Shelley Wilson Arshad Jhumka Gabriele Pergola 49 0 0 09 Mar 2025
Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing Neemesh Yadav Jiarui Liu Francesco Ortu Roya Ensafi Zhijing Jin Rada Mihalcea 36 0 0 07 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations David Hartmann Amin Oueslati Dimitri Staufer Lena Pohlmann Simon Munzert Hendrik Heuer 55 0 0 03 Mar 2025
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices Xinru Wang Mengjie Yu Hannah Nguyen Michael Iuzzolino Tianyi Wang ... Ting Zhang Naveen Sendhilnathan Hrvoje Benko Haijun Xia Tanya R. Jonker 53 0 0 26 Feb 2025
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models Chenlu Guo Nuo Xu Yi-Ju Chang Yuan Wu AI4MH LM&MA 57 1 0 24 Feb 2025
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement Junyu Lu Kai Ma Kaichun Wang Kelaiti Xiao Roy Ka-Wei Lee Bo Xu Liang Yang Hongfei Lin 51 0 0 10 Feb 2025
SCCD: A Session-based Dataset for Chinese Cyberbullying Detection Qingpo Yang Yakai Chen Zihui Xu Yu-ming Shang Sanchuan Guo Xi Zhang 44 0 0 28 Jan 2025
Regularization, Semi-supervision, and Supervision for a Plausible Attention-Based Explanation Duc Hau Nguyen Cyrielle Mallart Guillaume Gravier Pascale Sébillot 60 0 0 22 Jan 2025
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs Rohitash Chandra Guoxiang Ren G. Houseman 51 0 0 20 Jan 2025
U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario Jiaxin Song Xinyu Wang Yihao Wang Yifan Tang Ru Zhang Jianyi Liu Gongshen Liu AAML 45 0 0 03 Jan 2025
SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs Leon Fröhling Pietro Bernardelle Gianluca Demartini ALM 79 0 0 21 Dec 2024
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models Yuxi Sun Wei Gao Jing Ma Hongzhan Lin Ziyang Luo Wenxuan Zhang ELM 82 0 0 17 Dec 2024
Multilingual and Explainable Text Detoxification with Parallel Corpora Daryna Dementieva N. Babakov Amit Ronen A. Ayele Naquee Rizwan ... Elisei Stakovskii Eran Kaufman Ashraf Elnagar Animesh Mukherjee Alexander Panchenko 81 1 0 16 Dec 2024
Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs Mugdha Pandya Mali Jin Kalina Bontcheva Diana Maynard 81 0 0 05 Dec 2024
A Survey on Automatic Online Hate Speech Detection in Low-Resource Languages Susmita Das Arpita Dutta Kingshuk Roy Abir Mondal Arnab Mukhopadhyay 76 0 0 28 Nov 2024
HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter Manuel Tonneau Diyi Liu Niyati Malhotra Scott A. Hale Samuel Fraiberger Victor Orozco-Olvera Paul Röttger 78 0 0 23 Nov 2024
Can Highlighting Help GitHub Maintainers Track Security Fixes? Xueqing Liu Yuchen Xiong Qiushi Liu Jiangrui Zheng 72 0 0 18 Nov 2024
Pruning Literals for Highly Efficient Explainability at Word Level Rohan Kumar Yadav Bimal Bhattarai Abhik Jana Lei Jiao Seid Muhie Yimam 32 0 0 07 Nov 2024
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models Minh Duc Bui K. Wense Anne Lauscher VLM 34 1 0 06 Nov 2024
Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Lam Nguyen Tung Steven Cho Xiaoning Du Neelofar Neelofar Valerio Terragni Stefano Ruberto Aldeida Aleti 171 2 0 30 Oct 2024
ProvocationProbe: Instigating Hate Speech Dataset from Twitter Abhay Kumar Vigneshwaran Shankaran Rajesh Sharma 13 0 0 25 Oct 2024
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs Zhiting Fan Ruizhe Chen Tianxiang Hu Zuozhu Liu 26 7 0 25 Oct 2024
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition? Urja Khurana Eric T. Nalisnick Antske Fokkens 48 1 0 21 Oct 2024
BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla Fabiha Haider Fariha Tanjim Shifat Md Farhan Ishmam Deeparghya Dutta Barua Md Sakib Ul Rahman Sourove Md Fahim Md Farhad Alam 20 1 0 17 Oct 2024
Disentangling Hate Across Target Identities Yiping Jin Leo Wanner Aneesh Moideen Koya 25 0 0 14 Oct 2024
A Hate Speech Moderated Chat Application: Use Case for GDPR and DSA Compliance Jan Fillies Theodoros Mitsikas Ralph Schäfermeier Adrian Paschke 13 2 0 10 Oct 2024
Unveiling Transformer Perception by Exploring Input Manifolds A. Benfenati Alfio Ferrara A. Marta Davide Riva Elisabetta Rocchetti 18 0 0 08 Oct 2024
Hate Personified: Investigating the role of LLMs in content moderation Sarah Masud Sahajpreet Singh Viktor Hangya Alexander Fraser Tanmoy Chakraborty 30 7 0 03 Oct 2024
CrowdCounter: A benchmark type-specific multi-target counterspeech dataset Punyajoy Saha Abhilash Datta Abhik Jana Animesh Mukherjee 28 0 0 02 Oct 2024
PclGPT: A Large Language Model for Patronizing and Condescending Language Detection Hongbo Wang Mingda Li Junyu Lu Hebin Xia Liang Yang Bo Xu Ruizhu Liu Hongfei Lin 32 0 0 01 Oct 2024
Faithfulness and the Notion of Adversarial Sensitivity in NLP Explanations Supriya Manna Niladri Sett AAML 29 2 0 26 Sep 2024
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora Yungi Kim Hyunsoo Ha Sukyung Lee Jihoo Kim Seonghoon Yang Chanjun Park 41 0 0 15 Sep 2024
Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions Yifan Liu Yike Li Dong Wang 36 0 0 27 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? Urja Khurana Eric T. Nalisnick Antske Fokkens Swabha Swayamdipta 42 3 0 26 Aug 2024
An Investigation Into Explainable Audio Hate Speech Detection Jinmyeong An Wonjun Lee Yejin Jeon Jungseul Ok Yunsu Kim Gary Geunbae Lee 30 2 0 12 Aug 2024
Machine Unlearning in Generative AI: A Survey Zheyuan Liu Guangyao Dou Zhaoxuan Tan Yijun Tian Meng Jiang MU 31 14 0 30 Jul 2024
Towards Generalized Offensive Language Identification A. Dmonte Tejas Arya Tharindu Ranasinghe Marcos Zampieri 52 3 0 26 Jul 2024
Explanation Regularisation through the Lens of Attributions Pedro Ferreira Wilker Aziz Ivan Titov 46 1 0 23 Jul 2024
POLygraph: Polish Fake News Dataset Daniel Dzienisiewicz Filip Graliñski Piotr Jabłoński Marek Kubis Paweł Skórzewski Piotr Wierzchoñ 28 0 0 01 Jul 2024
Free-text Rationale Generation under Readability Level Control Yi-Sheng Hsu Nils Feldhus Sherzod Hakimov 46 0 0 01 Jul 2024
Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management Seid Muhie Yimam Daryna Dementieva Tim Fischer Daniil Moskovskiy Naquee Rizwan ... Sarthak Roy Martin Semmann Alexander Panchenko Chris Biemann Animesh Mukherjee 46 0 0 27 Jun 2024
Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services David Hartmann Amin Oueslati Dimitri Staufer MLAU 35 1 0 20 Jun 2024