ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.00086
  4. Cited By
Challenges in Automated Debiasing for Toxic Language Detection

Challenges in Automated Debiasing for Toxic Language Detection

29 January 2021
Xuhui Zhou
Maarten Sap
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
ArXivPDFHTML

Papers citing "Challenges in Automated Debiasing for Toxic Language Detection"

38 / 38 papers shown
Title
Safety in Large Reasoning Models: A Survey
Safety in Large Reasoning Models: A Survey
Cheng Wang
Yong Liu
Yangqiu Song
Duzhen Zhang
Zechao Li
Junfeng Fang
Bryan Hooi
LRM
242
2
0
24 Apr 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
136
16
0
30 Jan 2025
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic
  CheckLists
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
44
1
0
30 Aug 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
35
0
31 May 2024
Is this the real life? Is this just fantasy? The Misleading Success of
  Simulating Social Interactions With LLMs
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
Xuhui Zhou
Zhe Su
Tiwalayo Eisape
Hyunwoo J. Kim
Maarten Sap
39
38
0
08 Mar 2024
Algorithmic Arbitrariness in Content Moderation
Algorithmic Arbitrariness in Content Moderation
Juan Felipe Gomez
Caio Vieira Machado
Lucas Monteiro Paes
Flavio du Pin Calmon
36
9
0
26 Feb 2024
Generative AI for Hate Speech Detection: Evaluation and Findings
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
33
11
0
16 Nov 2023
Examining Temporal Bias in Abusive Language Detection
Examining Temporal Bias in Abusive Language Detection
Mali Jin
Yida Mu
Diana Maynard
Kalina Bontcheva
39
5
0
25 Sep 2023
A Survey on Fairness in Large Language Models
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
57
60
0
20 Aug 2023
HateModerate: Testing Hate Speech Detectors against Content Moderation
  Policies
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
Jiangrui Zheng
Xueqing Liu
Guanqun Yang
Mirazul Haque
Xing Qian
Ravishka Rathnasuriya
Wei Yang
G. Budhrani
52
3
0
23 Jul 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language
  Models
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei Xu
37
86
0
23 May 2023
Backdoor Learning for NLP: Recent Advances, Challenges, and Future
  Research Directions
Backdoor Learning for NLP: Recent Advances, Challenges, and Future Research Directions
Marwan Omar
SILM
AAML
37
20
0
14 Feb 2023
Towards Agile Text Classifiers for Everyone
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
52
13
0
13 Feb 2023
Rating Sentiment Analysis Systems for Bias through a Causal Lens
Rating Sentiment Analysis Systems for Bias through a Causal Lens
Kausik Lakkaraju
Biplav Srivastava
Marco Valtorta
34
7
0
04 Feb 2023
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Multi-VALUE: A Framework for Cross-Dialectal English NLP
Caleb Ziems
William B. Held
Jingfeng Yang
Jwala Dhamala
Rahul Gupta
Diyi Yang
51
41
0
15 Dec 2022
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
  Artificial Adversaries?
NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as Artificial Adversaries?
Saadia Gabriel
Hamid Palangi
Yejin Choi
AAML
50
1
0
08 Nov 2022
The State of Profanity Obfuscation in Natural Language Processing
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
47
7
0
14 Oct 2022
Controlling Bias Exposure for Fair Interpretable Predictions
Controlling Bias Exposure for Fair Interpretable Predictions
Zexue He
Yu-Xiang Wang
Julian McAuley
Bodhisattwa Prasad Majumder
27
19
0
14 Oct 2022
Unified Detoxifying and Debiasing in Language Generation via
  Inference-time Adaptive Optimization
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
Zonghan Yang
Xiaoyuan Yi
Peng Li
Yang Liu
Xing Xie
38
33
0
10 Oct 2022
Toward Understanding Bias Correlations for Mitigation in NLP
Toward Understanding Bias Correlations for Mitigation in NLP
Lu Cheng
Suyu Ge
Huan Liu
39
8
0
24 May 2022
Towards Debiasing Translation Artifacts
Towards Debiasing Translation Artifacts
Koel Dutta Chowdhury
Rricha Jalota
C. España-Bonet
Josef van Genabith
31
6
0
16 May 2022
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes
Antonis Maronikolakis
Philip Baader
Hinrich Schütze
34
9
0
13 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
On Explaining Multimodal Hateful Meme Detection Models
On Explaining Multimodal Hateful Meme Detection Models
Ming Shan Hee
Roy Ka-wei Lee
Wen-Haw Chong
VLM
29
39
0
04 Apr 2022
Mitigating Gender Bias in Distilled Language Models via Counterfactual
  Role Reversal
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Umang Gupta
Jwala Dhamala
Varun Kumar
Apurv Verma
Yada Pruksachatkun
Satyapriya Krishna
Rahul Gupta
Kai-Wei Chang
Greg Ver Steeg
Aram Galstyan
21
49
0
23 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
38
354
0
17 Mar 2022
Handling Bias in Toxic Speech Detection: A Survey
Handling Bias in Toxic Speech Detection: A Survey
Tanmay Garg
Sarah Masud
Tharun Suresh
Tanmoy Chakraborty
17
91
0
26 Jan 2022
Simple Text Detoxification by Identifying a Linear Toxic Subspace in
  Language Model Embeddings
Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings
Andrew Wang
Mohit Sudhakar
Yangfeng Ji
20
2
0
15 Dec 2021
"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During
  the COVID-19 Pandemic
"Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During the COVID-19 Pandemic
H. Nghiem
Fred Morstatter
25
8
0
04 Dec 2021
Annotators with Attitudes: How Annotator Beliefs And Identities Bias
  Toxic Language Detection
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap
Swabha Swayamdipta
Laura Vianna
Xuhui Zhou
Yejin Choi
Noah A. Smith
48
269
0
15 Nov 2021
Mitigating Racial Biases in Toxic Language Detection with an
  Equity-Based Ensemble Framework
Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework
Matan Halevy
Camille Harris
A. Bruckman
Diyi Yang
A. Howard
44
35
0
27 Sep 2021
Detecting Inspiring Content on Social Media
Detecting Inspiring Content on Social Media
Oana Ignat
Y-Lan Boureau
Jane A. Yu
A. Halevy
24
6
0
06 Sep 2021
General-Purpose Question-Answering with Macaw
General-Purpose Question-Answering with Macaw
Oyvind Tafjord
Peter Clark
SyDa
ELM
MLLM
32
59
0
06 Sep 2021
Just Say No: Analyzing the Stance of Neural Dialogue Generation in
  Offensive Contexts
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
21
84
0
26 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and
  Tooling
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
43
105
0
07 Jul 2021
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and
  Beyond
LMMS Reloaded: Transformer-based Sense Embeddings for Disambiguation and Beyond
Daniel Loureiro
A. Jorge
Jose Camacho-Collados
35
26
0
26 May 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
24
123
0
13 Apr 2021
Empirical Analysis of Multi-Task Learning for Reducing Model Bias in
  Toxic Comment Detection
Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection
Ameya Vaidya
Feng Mai
Yue Ning
115
21
0
21 Sep 2019
1