ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.15606
  4. Cited By
HateCheck: Functional Tests for Hate Speech Detection Models

HateCheck: Functional Tests for Hate Speech Detection Models

31 December 2020
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
ArXivPDFHTML

Papers citing "HateCheck: Functional Tests for Hate Speech Detection Models"

50 / 57 papers shown
Title
Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion
Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion
Yinghui Zhang
Tailin Chen
Yuchen Zhang
Zeyu Fu
85
0
0
17 May 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation
Shiza Ali
Jeremy Blackburn
Gianluca Stringhini
102
0
0
24 Feb 2025
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Echoes of Discord: Forecasting Hater Reactions to Counterspeech
Xiaoying Song
Sharon Lisseth Perez
Xinchen Yu
Eduardo Blanco
Lingzi Hong
419
0
0
17 Feb 2025
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen
Yixin Wu
Y. Qu
Michael Backes
Savvas Zannettou
Yang Zhang
108
5
0
28 Jan 2025
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
DefVerify: Do Hate Speech Models Reflect Their Dataset's Definition?
Urja Khurana
Eric T. Nalisnick
Antske Fokkens
96
2
0
21 Oct 2024
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
123
10
0
04 Oct 2024
CELL your Model: Contrastive Explanations for Large Language Models
CELL your Model: Contrastive Explanations for Large Language Models
Ronny Luss
Erik Miehling
Amit Dhurandhar
112
0
0
17 Jun 2024
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
Manuel Tonneau
Diyi Liu
Samuel Fraiberger
Ralph Schroeder
Scott A. Hale
Paul Röttger
67
6
0
27 Apr 2024
Specification Overfitting in Artificial Intelligence
Specification Overfitting in Artificial Intelligence
Benjamin Roth
Pedro Henrique Luz de Araujo
Yuxi Xia
Saskia Kaltenbrunner
Christoph Korab
184
1
0
13 Mar 2024
Constructing Highly Inductive Contexts for Dialogue Safety through
  Controllable Reverse Generation
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Minlie Huang
SILM
133
9
0
04 Dec 2022
Dynabench: Rethinking Benchmarking in NLP
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
201
407
0
07 Apr 2021
DeepHate: Hate Speech Detection via Multi-Faceted Text Representations
DeepHate: Hate Speech Detection via Multi-Faceted Text Representations
Rui Cao
Roy Ka-wei Lee
Tuan-Anh Hoang
78
89
0
14 Mar 2021
Learning from the Worst: Dynamically Generated Datasets to Improve
  Online Hate Detection
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
Bertie Vidgen
Tristan Thrush
Zeerak Talat
Douwe Kiela
117
270
0
31 Dec 2020
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
HABERTOR: An Efficient and Effective Deep Hatespeech Detector
T. Tran
Yifan Hu
Changwei Hu
Kevin Yen
Fei Tan
Kyumin Lee
Serim Park
VLM
64
32
0
17 Oct 2020
On Cross-Dataset Generalization in Automatic Detection of Online Abuse
On Cross-Dataset Generalization in Automatic Detection of Online Abuse
I. Nejadgholi
S. Kiritchenko
42
29
0
14 Oct 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
87
605
0
10 May 2020
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro
Tongshuang Wu
Carlos Guestrin
Sameer Singh
ELM
208
1,104
0
08 May 2020
Detecting East Asian Prejudice on Social Media
Detecting East Asian Prejudice on Social Media
Bertie Vidgen
Austin Botelho
David A. Broniatowski
E. Guest
Matthew Hall
Helen Z. Margetts
Rebekah Tromble
Zeerak Talat
Scott A. Hale
31
100
0
08 May 2020
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
Contextualizing Hate Speech Classifiers with Post-hoc Explanation
Brendan Kennedy
Xisen Jin
Aida Mostafazadeh Davani
Morteza Dehghani
Xiang Ren
85
141
0
05 May 2020
The Curse of Performance Instability in Analysis Datasets: Consequences,
  Source, and Suggestions
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
Xiang Zhou
Yixin Nie
Hao Tan
Joey Tianyi Zhou
80
40
0
28 Apr 2020
Directions in Abusive Language Training Data: Garbage In, Garbage Out
Directions in Abusive Language Training Data: Garbage In, Garbage Out
Bertie Vidgen
Leon Derczynski
61
264
0
03 Apr 2020
A Framework for the Computational Linguistic Analysis of Dehumanization
A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn
Yulia Tsvetkov
Dan Jurafsky
127
93
0
06 Mar 2020
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt
Alicia Parrish
Haokun Liu
Anhad Mohananey
Wei Peng
Sheng-Fu Wang
Samuel R. Bowman
75
492
0
02 Dec 2019
Social Bias Frames: Reasoning about Social and Power Implications of
  Language
Social Bias Frames: Reasoning about Social and Power Implications of Language
Maarten Sap
Saadia Gabriel
Lianhui Qin
Dan Jurafsky
Noah A. Smith
Yejin Choi
141
497
0
10 Nov 2019
Predictive Biases in Natural Language Processing Models: A Conceptual
  Framework and Overview
Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview
Deven Santosh Shah
H. Andrew Schwartz
Dirk Hovy
AI4CE
101
260
0
09 Nov 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
125
1,006
0
31 Oct 2019
A BERT-Based Transfer Learning Approach for Hate Speech Detection in
  Online Social Media
A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
Marzieh Mozafari
R. Farahbakhsh
Noel Crespi
65
352
0
28 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
232
7,520
0
02 Oct 2019
Learning the Difference that Makes a Difference with
  Counterfactually-Augmented Data
Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
Divyansh Kaushik
Eduard H. Hovy
Zachary Chase Lipton
CML
88
569
0
26 Sep 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator
  Bias in Natural Language Understanding Datasets
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
320
326
0
21 Aug 2019
Build it Break it Fix it for Dialogue Safety: Robustness from
  Adversarial Human Attack
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Emily Dinan
Samuel Humeau
Bharath Chintagunta
Jason Weston
81
246
0
17 Aug 2019
Tackling Online Abuse: A Survey of Automated Abuse Detection Methods
Tackling Online Abuse: A Survey of Automated Abuse Detection Methods
Pushkar Mishra
H. Yannakoudakis
Ekaterina Shutova
59
79
0
13 Aug 2019
What BERT is not: Lessons from a new suite of psycholinguistic
  diagnostics for language models
What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models
Allyson Ettinger
83
606
0
31 Jul 2019
Probing Neural Network Comprehension of Natural Language Arguments
Probing Neural Network Comprehension of Natural Language Arguments
Timothy Niven
Hung-Yu kao
AAML
88
454
0
17 Jul 2019
Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets
Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets
Nelson F. Liu
Roy Schwartz
Noah A. Smith
AAML
68
106
0
04 Apr 2019
Predicting the Type and Target of Offensive Posts in Social Media
Predicting the Type and Target of Offensive Posts in Social Media
Marcos Zampieri
S. Malmasi
Preslav Nakov
Sara Rosenthal
N. Farra
Ritesh Kumar
83
778
0
25 Feb 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
131
1,239
0
04 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Counterfactual Fairness in Text Classification through Robustness
Counterfactual Fairness in Text Classification through Robustness
Sahaj Garg
Vincent Perot
Nicole Limtiaco
Ankur Taly
Ed H. Chi
Alex Beutel
99
260
0
27 Sep 2018
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Betty van Aken
Julian Risch
Ralf Krestel
Alexander Loser
61
222
0
20 Sep 2018
Hierarchical CVAE for Fine-Grained Hate Speech Classification
Hierarchical CVAE for Fine-Grained Hate Speech Classification
Jing Qian
Mai Elsherief
E. Belding-Royer
William Yang Wang
49
47
0
31 Aug 2018
All You Need is "Love": Evading Hate-speech Detection
All You Need is "Love": Evading Hate-speech Detection
Tommi Gröndahl
Luca Pajola
Mika Juuti
Mauro Conti
Nadarajah Asokan
43
234
0
28 Aug 2018
Targeted Syntactic Evaluation of Language Models
Targeted Syntactic Evaluation of Language Models
Rebecca Marvin
Tal Linzen
81
416
0
27 Aug 2018
Reducing Gender Bias in Abusive Language Detection
Reducing Gender Bias in Abusive Language Detection
Ji Ho Park
Jamin Shin
Pascale Fung
FaML
51
340
0
22 Aug 2018
Semantic Variation in Online Communities of Practice
Semantic Variation in Online Communities of Practice
Marco Del Tredici
Raquel Fernández
94
39
0
15 Jun 2018
Stress Test Evaluation for Natural Language Inference
Stress Test Evaluation for Natural Language Inference
Aakanksha Naik
Abhilasha Ravichander
Norman M. Sadeh
Carolyn Rose
Graham Neubig
ELM
70
377
0
02 Jun 2018
Breaking NLI Systems with Sentences that Require Simple Lexical
  Inferences
Breaking NLI Systems with Sentences that Require Simple Lexical Inferences
Max Glockner
Vered Shwartz
Yoav Goldberg
NAI
88
366
0
06 May 2018
Leveraging Intra-User and Inter-User Representation Learning for
  Automated Hate Speech Detection
Leveraging Intra-User and Inter-User Representation Learning for Automated Hate Speech Detection
Jing Qian
Mai Elsherief
E. Belding-Royer
William Yang Wang
51
87
0
09 Apr 2018
Challenges in Discriminating Profanity from Hate Speech
Challenges in Discriminating Profanity from Hate Speech
S. Malmasi
Marcos Zampieri
68
243
0
14 Mar 2018
Hate Speech Detection: A Solved Problem? The Challenging Case of Long
  Tail on Twitter
Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter
Ziqi Zhang
Le Luo
46
294
0
27 Feb 2018
12
Next